Model Modules
StackingModel
- class pymaftools.model.StackingModel.OmicsStackingModel(omics_dict, class_order, base_model=<class 'sklearn.ensemble._forest.RandomForestClassifier'>, final_model=<class 'sklearn.linear_model._logistic.LogisticRegression'>, random_state=42)[source]
Bases:
objectMulti-omics stacking classifier.
Builds a
StackingClassifierwhere each base estimator operates on a single omics layer, and a final meta-learner combines their predictions.- Parameters:
omics_dict (dict[str, PivotTable]) – Mapping of omics names to PivotTable objects (features as index).
class_order (list[str]) – Ordered class labels used for encoding/decoding.
base_model (type, default
RandomForestClassifier) – Class of the base estimator (instantiated per omics layer).final_model (type, default
LogisticRegression) – Class of the final meta-learner.random_state (int, default 42) – Random seed for reproducibility.
- fit(X, y)[source]
Fit the stacking model.
- Parameters:
X (pd.DataFrame) – Training data (samples as rows, all omics features as columns).
y (array-like) – Target labels.
- Return type:
- predict(X)[source]
Predict class labels.
- Parameters:
X (pd.DataFrame) – Input data.
- Return type:
- Returns:
np.ndarray – Decoded class labels.
- predict_proba(X)[source]
Predict class probabilities.
- Parameters:
X (pd.DataFrame) – Input data.
- Return type:
- Returns:
np.ndarray – Probability matrix of shape
(n_samples, n_classes).
- get_omics_feature_importance(omics_key)[source]
Get feature importances for a specific omics layer.
- Parameters:
omics_key (str) – Key in
omics_dictidentifying the omics layer.- Return type:
Series- Returns:
pd.Series – Feature importances indexed by feature names.
- get_omics_weights()[source]
Return the weights of each omics layer in the final meta-learner.
- Return type:
DataFrame- Returns:
pd.DataFrame – Weights with omics as rows. Includes
abs_meanandabs_ratiocolumns for interpretability.- Raises:
ValueError – If the model has not been fitted or the final estimator does not expose
coef_.
- plot_final_coefficients()[source]
Plot the final meta-learner coefficients as a heatmap.
- Return type:
- evaluate(X, y_true, average='macro', show=True)[source]
Evaluate classification performance.
- Parameters:
X (pd.DataFrame) – Input data.
y_true (array-like) – True labels.
average (str, default
"macro") – Averaging strategy for multi-class metrics.show (bool, default True) – Whether to print the metrics.
- Return type:
- Returns:
dict[str, float | None] – Dictionary with keys
accuracy,f1,precision,recall, androc_auc.
- class pymaftools.model.StackingModel.ASCStackingModel(omics_dict, class_order, random_state=42)[source]
Bases:
OmicsStackingModelStacking model pre-configured for ASC (adenosquamous carcinoma) analysis.
- Parameters:
modelUtils
- pymaftools.model.modelUtils.get_importance(model)[source]
Extract feature importance from a fitted model.
Supports sklearn estimators with
feature_importances_andOmicsStackingModelinstances.- Parameters:
model (object) – A fitted model.
- Return type:
Series- Returns:
pd.Series – Feature importances indexed by feature names.
- Raises:
ValueError – If the model type is not supported.
- pymaftools.model.modelUtils.evaluate_model(model, X_test, y_test)[source]
Evaluate a single model and return metric dictionary.
- pymaftools.model.modelUtils.cross_validate_importance(X, y, model_func, model_name, n_seeds=5, n_splits=5, random_state_base=0, verbose=True, evaluate_func=None)[source]
Run repeated stratified cross-validation, collecting feature importances and metrics.
- Parameters:
X (pd.DataFrame) – Feature matrix (samples as rows).
y (pd.Series) – Target labels.
model_func (callable) – Factory
model_func(seed) -> modelreturning a fresh model instance.model_name (str) – Name identifier for this model.
n_seeds (int, default 5) – Number of random seeds (repetitions).
n_splits (int, default 5) – Number of CV folds per seed.
random_state_base (int, default 0) – Base value added to each seed for reproducibility.
verbose (bool, default True) – Whether to display a progress bar.
evaluate_func (callable, optional) – Function
(model, X_test, y_test) -> dictreturning per-fold metrics.
- Return type:
tuple[pd.DataFrame, pd.DataFrame | None]
- Returns:
importance_df (pd.DataFrame) – Long-format feature importance table.
metric_df (pd.DataFrame or None) – Long-format metrics table (
Noneifevaluate_funcis not provided).
- pymaftools.model.modelUtils.plot_metric_comparison_with_annotation(data, metrics=None, group_col='model', order=None, palette='Set2', test='Mann-Whitney', alpha=0.8, fontsize=14, figsize=None, title_prefix=None, save_path=None, **save_kwargs)[source]
Plot metric comparison boxplots with statistical annotations.
- Parameters:
data (pd.DataFrame) – DataFrame containing model metrics.
metrics (list[str], optional) – Metric column names to plot. Default
["acc", "f1", "auc"].group_col (str, default
"model") – Column used for grouping.palette (str, default
"Set2") – Seaborn color palette.test (str, default
"Mann-Whitney") – Statistical test for annotations.alpha (float, default 0.8) – Box transparency.
fontsize (int, default 14) – Font size.
figsize (tuple, optional) – Figure size.
title_prefix (str, optional) – Title prefix (
Nonedisables titles).save_path (str, optional) – Path to save the figure.
**save_kwargs – Additional arguments passed to save method.
- Returns:
ModelPlot – The plotter instance.
- pymaftools.model.modelUtils.to_importance_table(all_importance_df, omic)[source]
Convert long-format importance data to a sorted PivotTable.
- Parameters:
all_importance_df (pd.DataFrame) – Long-format importance DataFrame with columns
model,seed,fold,feature,importance.omic (str) – Omics name to filter by.
- Return type:
- Returns:
PivotTable – Feature x seed matrix sorted by mean importance (descending).
- pymaftools.model.modelUtils.plot_top_feature_importance_heatmap(mean_importance_df, omic, top_n=20, cmap='viridis', figsize=(10, 6), title=None, save_path=None, **save_kwargs)[source]
Plot heatmap of top-N most important features.
- Parameters:
mean_importance_df (pd.DataFrame) – Feature importance data.
omic (str) – Omics name identifier.
top_n (int, default 20) – Number of top features to display.
cmap (str, default
"viridis") – Colormap for the heatmap.figsize (tuple, default
(10, 6)) – Figure size.title (str, optional) – Plot title (
Nonedisables title).save_path (str, optional) – Path to save the figure.
**save_kwargs – Additional arguments passed to save method.
- Returns:
ModelPlot – The plotter instance.
- pymaftools.model.modelUtils.run_rfecv_feature_selection(pivot, label_col='subtype', estimator=None, step=10, scoring='accuracy', min_features_to_select=10, plot=True, random_state=42, title=None, save_path=None, **save_kwargs)[source]
Run RFECV feature selection on a PivotTable.
- Parameters:
pivot (PivotTable) – Feature x sample table.
label_col (str, default
"subtype") – Column insample_metadatacontaining target labels.estimator (sklearn estimator, optional) – Model to use (default:
RandomForestClassifier).step (int, default 10) – Number of features removed per iteration.
scoring (str, default
"accuracy") – Scoring metric (e.g."accuracy","f1_macro").min_features_to_select (int, default 10) – Minimum number of features to keep.
plot (bool, default True) – Whether to plot the performance curve.
random_state (int, default 42) – Random seed.
title (str, optional) – Plot title (
Nonedisables title).save_path (str, optional) – Path to save the figure.
**save_kwargs – Additional arguments passed to save method.
- Return type:
- Returns:
selected_features (list[str]) – Selected feature names.
selector (RFECV) – Fitted RFECV object.
- pymaftools.model.modelUtils.run_model_evaluation(model_configs, y, n_seeds=100, n_splits=5, evaluate_func=None, verbose=True)[source]
Run cross-validation and importance analysis for multiple models.
- Parameters:
model_configs (list[dict]) – Each dict must have keys
"name"(str),"model_func"(callable), and"X"(pd.DataFrame).y (pd.Series) – Target labels.
n_seeds (int, default 100) – Number of random seeds.
n_splits (int, default 5) – Number of CV folds.
evaluate_func (callable, optional) – Evaluation function
(model, X_test, y_test) -> dict.verbose (bool, default True) – Whether to print progress.
- Return type:
tuple[dict, pd.DataFrame, pd.DataFrame]
- Returns:
result_dict (dict) – Per-model results with
"importance"and"metrics"keys.all_importance_df (pd.DataFrame) – Combined long-format feature importance data.
all_metrics_df (pd.DataFrame) – Combined long-format classification metrics.