Utility Modules
geneset
- pymaftools.utils.geneset.read_GMT(filepath)[source]
Read a GMT (Gene Matrix Transposed) file into a DataFrame.
- Parameters:
filepath (str) – Path to the GMT file.
- Return type:
DataFrame- Returns:
pd.DataFrame – DataFrame indexed by pathway name with columns
LinkandGenes.
- pymaftools.utils.geneset.fetch_msigdb_geneset(geneset_name, species='human')[source]
Fetch a gene set from MSigDB by scraping its HTML page.
- Parameters:
geneset_name (str) – Name of the gene set on MSigDB (e.g.
"HALLMARK_APOPTOSIS").species (str, default
"human") – Species identifier used in the MSigDB URL.
- Return type:
DataFrame- Returns:
pd.DataFrame – DataFrame with columns
source_id,entrez_id,gene_symbol, anddescription.
reduction
- class pymaftools.utils.reduction.PCA_CCA(n_pca_components=20, n_cca_components=1, random_state=42)[source]
Bases:
objectPerform PCA on SNV and CNV tables separately, followed by Canonical Correlation Analysis (CCA).
This class provides utilities to project two omics data tables (SNV and CNV) into a shared latent space using PCA for dimensionality reduction and CCA for capturing cross-omics correlations. It also allows mapping the canonical weights back to the original feature space for interpretation.
- Parameters:
- fit_transform(snv_table, cnv_table)[source]
Fit PCA on SNV and CNV tables, then fit CCA on the reduced embeddings.
- Parameters:
snv_table (pd.DataFrame) – SNV data table with features as rows and samples as columns.
cnv_table (pd.DataFrame) – CNV data table with features as rows and samples as columns.
- Returns:
cca_snv (ndarray of shape (n_samples, n_cca_components)) – Canonical variates for SNV data.
cca_cnv (ndarray of shape (n_samples, n_cca_components)) – Canonical variates for CNV data.
- transform(snv_table, cnv_table)[source]
Project new SNV and CNV data into the canonical space using fitted PCA and CCA models.
- Parameters:
snv_table (pd.DataFrame) – SNV data table with features as rows and samples as columns.
cnv_table (pd.DataFrame) – CNV data table with features as rows and samples as columns.
- Returns:
cca_snv (ndarray of shape (n_samples, n_cca_components)) – Canonical variates for SNV data.
cca_cnv (ndarray of shape (n_samples, n_cca_components)) – Canonical variates for CNV data.
- get_weights()[source]
Retrieve feature weights in the canonical variates.
This method back-projects the CCA weights from the PCA-reduced space into the original feature space, enabling interpretation of which features contribute most to the canonical correlation.
- Returns:
df_snv (pd.DataFrame) – DataFrame of SNV feature weights in canonical components.
df_cnv (pd.DataFrame) – DataFrame of CNV feature weights in canonical components.
- Raises:
ValueError – If the model has not been fitted yet.
geneinfo
- pymaftools.utils.geneinfo.get_ncbi_gene_ID(gene_symbol)[source]
Query the NCBI Entrez API for a gene symbol and return its Gene ID.
- pymaftools.utils.geneinfo.get_ncbi_gene_IDs(gene_symbols)[source]
Batch-query gene symbols and return their corresponding Gene IDs.
- pymaftools.utils.geneinfo.get_gene_info_json(gene_ids)[source]
Retrieve detailed gene information from NCBI for multiple Gene IDs.
- pymaftools.utils.geneinfo.parse_gene_info(gene_info)[source]
Extract summary descriptions from detailed gene information.