Utilities¶

exception palantir.utils.CellNotFoundExceptionView on GitHub ¶

Bases: Exception

Exception raised when no cell could be determined by the used method.

palantir.utils.compute_kernel(data: DataFrame | AnnData, knn: int = 30, alpha: float = 0, pca_key: str = 'X_pca', kernel_key: str = 'DM_Kernel', backend: str | None = None) → csr_matrixView on GitHub ¶

Compute the adaptive anisotropic diffusion kernel.

Parameters:

data (Union[pd.DataFrame, AnnData]) – Data points (rows) in a feature space (columns) for pd.DataFrame. For AnnData, it uses the .X attribute.
knn (int) – Number of nearest neighbors for adaptive kernel calculation. Default is 30.
alpha (float) – Normalization parameter for the diffusion operator. Default is 0.
pca_key (str, optional) – Key to retrieve PCA projections from data if it is a AnnData object. Default is ‘X_pca’.
kernel_key (str, optional) – Key to store the kernel in obsp of data if it is a AnnData object. Default is ‘DM_Kernel’.
backend (str, optional) – Kernel construction backend: “scanpy” (parity with prior behavior; approximate kNN via scanpy/UMAP) or “sklearn” (exact kNN; may drift and can be slower on large/high-dimensional data). Defaults to palantir.config.KERNEL_BACKEND.

Returns:

Computed kernel matrix.

Return type:

csr_matrix

palantir.utils.determine_multiscale_space(dm_res: dict | AnnData, n_eigs: int | None = None, eigval_key: str = 'DM_EigenValues', eigvec_key: str = 'DM_EigenVectors', out_key: str = 'DM_EigenVectors_multiscaled') → DataFrame | NoneView on GitHub ¶

Determine the multi-scale space of the data.

Parameters:

dm_res (Union[dict, AnnData]) – Diffusion map results from run_diffusion_maps. If AnnData is passed, its uns[eigval_key] and obsm[eigvec_key] are used.
n_eigs (Union[int, None], optional) – Number of eigen vectors to use. If None is specified, the number of eigen vectors will be determined using the eigen gap. Default is None.
eigval_key (str, optional) – Key to retrieve EigenValues from dm_res if it is a AnnData object. Default is ‘DM_EigenValues’.
eigvec_key (str, optional) – Key to retrieve EigenVectors from dm_res if it is a AnnData object. Default is ‘DM_EigenVectors’.
out_key (str, optional) – Key to store the result in obsm of dm_res if it is a AnnData object. Default is ‘DM_EigenVectors_multiscaled’.

Returns:

Multi-scale data matrix. If AnnData is passed as dm_res, the result is written to its obsm[out_key] and None is returned.

Return type:

Union[pd.DataFrame, None]

palantir.utils.diffusion_maps_from_kernel(kernel: csr_matrix, n_components: int = 10, seed: int | None = 0) → Dict[str, csr_matrix | DataFrame | Series]View on GitHub ¶

Compute the diffusion map given a kernel matrix.

Parameters:

kernel (csr_matrix) – Precomputed kernel matrix.
n_components (int) – Number of diffusion components to compute. Default is 10.
seed (Union[int, None]) – Seed for random initialization. Default is 0.

Returns:

Dictionary containing: - T: Transition matrix (csr_matrix) - EigenVectors: Diffusion components (pd.DataFrame) - EigenValues: Corresponding eigenvalues (pd.Series)

Return type:

Dict[str, Union[csr_matrix, pd.DataFrame, pd.Series]]

palantir.utils.early_cell(ad: AnnData, celltype: str, celltype_column: str = 'celltype', eigvec_key: str = 'DM_EigenVectors_multiscaled', fallback_seed: int = None)View on GitHub ¶

Helper function to determine ‘early_cell’ for ‘run_palantir’. It identifies the cell of ‘celltype’ at the extremes of the state space represented by diffusion maps.

Parameters:

ad (AnnData) – Annotated data matrix.
celltype (str) – The specific cell type of interest for determining the early cell.
celltype_column (str, optional) – Name of the column in the obs of the Anndata object where the cell type information is stored. Default is ‘celltype’.
eigvec_key (str, optional) – Key to access multiscale space diffusion components from obsm of ad. Default is ‘DM_EigenVectors_multiscaled’.
fallback_seed (int, optional) – Seed for random number generator in fallback method. If not specified, the fallback method is not applied and CellNotFoundException error is raised instead. Default is None.

Returns:

Name of the early cell for the given cell type.

Return type:

str

Raises:

CellNotFoundException – If no valid cell of the specified type can be found at the extremes of the diffusion map.

palantir.utils.fallback_terminal_cell(ad: AnnData, celltype: str, celltype_column: str = 'anno', eigvec_key: str = 'DM_EigenVectors_multiscaled', seed: int = 2353)View on GitHub ¶

Fallback method to identify terminal cells when no valid diffusion component is found for the specified cell type.

Parameters:

ad (AnnData) – Annotated data matrix.
celltype (str) – The specific cell type of interest for determining the terminal cell.
celltype_column (str, optional) – Name of the column in the obs of the Anndata object where the cell type information is stored. Default is ‘anno’.
eigvec_key (str, optional) – Key to access multiscale space diffusion components from obsm of ad. Default is ‘DM_EigenVectors_multiscaled’.
seed (int, optional) – Seed for random number generator in fallback method. If not specified, no seed is used. Default is 2353.

Returns:

Name of the terminal cell for the given cell type.

Return type:

str

palantir.utils.find_terminal_states(ad: AnnData, celltypes: Iterable, celltype_column: str = 'celltype', eigvec_key: str = 'DM_EigenVectors_multiscaled', fallback_seed: int = None)View on GitHub ¶

Identifies terminal states for a list of cell types in the AnnData object.

This function iterates over the provided cell types, trying to find a terminal cell for each one using the ‘early_cell’ function. If no valid component is found for a cell type, it emits a warning and proceeds to the next cell type.

Parameters:

ad (AnnData) – Annotated data matrix from Scanpy. It should contain computed diffusion maps.
celltypes (Iterable) – An iterable such as a list or tuple of cell type names for which terminal states should be identified.
celltype_column (str, optional) – The name of the column in the obs dataframe of the Anndata object where the cell type information is stored. By default, it is ‘celltype’.
eigvec_key (str, optional) – Key to access multiscale space diffusion components from obsm of ad. Default is ‘DM_EigenVectors_multiscaled’.
fallback_seed (int, optional) – Seed for random number generator in fallback method. If not specified, the fallback method is not applied and CellNotFoundException error is raised instead. Defaults to None.

Returns:

A pandas Series where the indices are the cell types and the values are the names of the terminal cells. If no terminal cell is found for a cell type, it will not be included in the series.

Return type:

pd.Series

palantir.utils.run_density(ad: AnnData, repr_key: str = 'DM_EigenVectors', density_key: str = 'mellon_log_density', **kwargs) → ndarrayView on GitHub ¶

Compute cell-state density with Mellon.

This function uses the Mellon algorithm to compute the density of cell states, which is stored in the obs attribute of the AnnData object. The function returns the computed density. If ‘DM_EigenVectors’ is not found in the AnnData object, an error is raised suggesting the user to run the function palantir.utils.run_diffusion_maps(ad).

Additionally, the density prediction model is serialized and stored in the .uns attribute of the AnnData object under the key ‘{density_key}_predictor’. This can be deserialized and used for prediction by using the mellon.Predictor.from_dict() method.

Parameters:

ad (AnnData) – AnnData object containing the gene expression data and pseudotime.
repr_key (str, optional) – Key to retrieve cell-state representation from the AnnData object. Default is ‘DM_EigenVectors’.
density_key (str, optional) – Key under which the computed density values are stored in the obs of the AnnData object. Default is ‘mellon_log_density’.
**kwargs (dict) – Additional keyword arguments to be passed to mellon.DensityEstimator.

Returns:

log_density – A numpy array of log density values computed for each cell.

Return type:

np.ndarray

Raises:

ValueError – If repr_key is not found in ad.obsm.

palantir.utils.run_density_evaluation(in_ad: AnnData, out_ad: AnnData, predictor_key: str = 'mellon_log_density_predictor', repr_key: str = 'DM_EigenVectors', density_key: str = 'cross_log_density', **kwargs) → ndarrayView on GitHub ¶

Evaluates the density function of in_ad.uns[predictor_key] on the representations of out_ad.obsm[repr_key].

Parameters:

in_ad (AnnData) – AnnData object containing the gene expression data and the serialized predictor.
out_ad (AnnData) – AnnData object containing the gene expression data and representations to be evaluated.
predictor_key (str, optional) – Key to access the predictor in the uns of the in_ad AnnData object. Default is ‘mellon_log_density_predictor’.
repr_key (str, optional) – Key to access representations in the obsm of the out_ad AnnData object. Default is ‘DM_EigenVectors’.
density_key (str, optional) – Key under which the computed density values are stored in the obs of the out_ad AnnData object. Default is ‘cross_log_density’.
**kwargs (dict) – Additional keyword arguments, unused in this function.

Returns:

log_density – A numpy array of log density values computed for each cell in out_ad.

Return type:

np.ndarray

Raises:

ValueError – If repr_key is not found in out_ad.obsm or predictor_key is not found in in_ad.uns.

palantir.utils.run_diffusion_maps(data: DataFrame | AnnData, n_components: int = 10, knn: int = 30, alpha: float = 0, seed: int | None = 0, kernel_backend: str = 'scanpy', pca_key: str = 'X_pca', kernel_key: str = 'DM_Kernel', sim_key: str = 'DM_Similarity', eigval_key: str = 'DM_EigenValues', eigvec_key: str = 'DM_EigenVectors') → Dict[str, csr_matrix | DataFrame | Series]View on GitHub ¶

Run Diffusion maps using the adaptive anisotropic kernel.

Parameters:

data (Union[pd.DataFrame, AnnData]) – PCA projections of the data or adjacency matrix. If AnnData is passed, its obsm[pca_key] is used and the result is written to its obsp[kernel_key], obsm[eigvec_key], and uns[eigval_key].
n_components (int, optional) – Number of diffusion components. Default is 10.
knn (int, optional) – Number of nearest neighbors for graph construction. Default is 30.
alpha (float, optional) – Normalization parameter for the diffusion operator. Default is 0.
seed (Union[int, None], optional) – Numpy random seed, randomized if None, set to an arbitrary integer for reproducibility. Default is 0.
kernel_backend (str, optional) – Kernel construction backend: “scanpy” (parity with prior behavior; approximate kNN via scanpy/UMAP) or “sklearn” (exact kNN; may drift and can be slower on large/high-dimensional data). Defaults to “scanpy”.
pca_key (str, optional) – Key to retrieve PCA projections from data if it is a AnnData object. Default is ‘X_pca’.
kernel_key (str, optional) – Key to store the kernel in obsp of data if it is a AnnData object. Default is ‘DM_Kernel’.
sim_key (str, optional) – Key to store the similarity in obsp of data if it is a AnnData object. Default is ‘DM_Similarity’.
eigval_key (str, optional) – Key to store the EigenValues in uns of data if it is a AnnData object. Default is ‘DM_EigenValues’.
eigvec_key (str, optional) – Key to store the EigenVectors in obsm of data if it is a AnnData object. Default is ‘DM_EigenVectors’.

Returns:

Dictionary containing: - kernel: Computed kernel matrix - T: Transition matrix - EigenVectors: Diffusion components - EigenValues: Corresponding eigenvalues If AnnData is passed as data, these results are also written to the input object.

Return type:

Dict[str, Union[csr_matrix, pd.DataFrame, pd.Series]]

palantir.utils.run_local_variability(ad: AnnData, expression_key: str = 'MAGIC_imputed_data', distances_key: str = 'distances', localvar_key: str = 'local_variability', progress: bool = False, eps: float = 1e-16) → ndarrayView on GitHub ¶

Compute local gene variability scores for each cell.

This function calculates the variability in gene expression in a local neighbourhood for each cell. It adds the result to the layers of the given AnnData object under the specified key.

Parameters:

ad (AnnData) – AnnData object containing the gene expression data and pseudotime.
expression_key (str, optional) – Key to access the gene expression data in the layers of the AnnData object. If None, uses raw expression data in .X. Default is ‘MAGIC_imputed_data’.
distances_key (str, optional) – Key to access the distances matrix in the obsm of the AnnData object. Default is ‘distances’.
localvar_key (str, optional) – Key under which the computed local variability matrix is stored in the layers of the AnnData object. Default is ‘local_variability’.
progress (bool, optional) – Show progress bar. Requires tqdm to be installed. Default is False.
eps (float, optional) – A small value preventing division by 0. Defaults to 1e-16.

Returns:

local_variability – A 2D numpy array of local variability scores for each gene in each cell.

Return type:

np.ndarray

palantir.utils.run_low_density_variability(ad: AnnData, cell_mask: str | ndarray | List[str] | Series | Index = 'branch_masks', density_key: str = 'mellon_log_density', localvar_key: str = 'local_variability', score_key: str = 'low_density_gene_variability') → ndarrayView on GitHub ¶

Compute the scores aggregated local gene variability in low-density cell-state transitions.

Parameters:

ad (AnnData) – AnnData object containing the gene expression data and pseudotime.
cell_mask (str, np.ndarray, list of str, pd.Series, pd.Index, optional) – Key to access the mask matrix in the obsm or obs attributes of the AnnData object. If cell_mask is a numpy array with shape (ad.n_obs, ), it is used directly. If cell_mask is a list of cell names, a pd.Series, or a pd.Index, it is used to create a boolean mask of the same length as ad.n_obs. Default is ‘branch_masks’.
density_key (str, optional) – Key to access the density values in the obs attribute of the AnnData object. Default is ‘mellon_log_density’.
localvar_key (str, optional) – Key to access local variability matrix in the layers of the AnnData object. Default is ‘local_variability’.
score_key (str, optional) – Prefix of the key under which the computed scores are stored in the var attribute of the AnnData object. Actual keys are ‘{score_key}_{branch_name}’ if cell_mask points to an ad.obsm. Default is ‘low_density_gene_variability’.

Returns:

low_density_scores – A numpy array of scores for each gene.

Return type:

np.ndarray

Raises:

ValueError – If any of the provided keys are not found in the appropriate fields of the AnnData object.

palantir.utils.run_magic_imputation(data: ndarray | DataFrame | AnnData | csr_matrix, dm_res: dict | None = None, n_steps: int = 3, sim_key: str = 'DM_Similarity', expression_key: str = None, imputation_key: str = 'MAGIC_imputed_data', n_jobs: int = -1, sparse: bool = True, clip_threshold: float = 0.01) → DataFrame | None | csr_matrix | ndarrayView on GitHub ¶

Run MAGIC imputation on the data.

Parameters:

data (Union[np.ndarray, pd.DataFrame, AnnData, csr_matrix]) – Array or DataFrame of cells X genes, AnnData object, or a sparse csr_matrix.
dm_res (Union[dict, None], optional) – Diffusion map results from run_diffusion_maps. If None and data is a AnnData object, its obsp[kernel_key] is used. Default is None.
n_steps (int, optional) – Number of steps in the diffusion operator. Default is 3.
expression_key (str, optional) – Key to access the gene expression data in the layers of the AnnData object. If None, uses raw expression data in .X. Default is None.
sim_key (str, optional) – Key to access the similarity in obsp of data if it is a AnnData object. Default is ‘DM_Similarity’.
imputation_key (str, optional) – Key to store the imputed data in layers of data if it is a AnnData object. Default is ‘MAGIC_imputed_data’.
n_jobs (int, optional) – Number of cores to use for parallel processing. If -1, all available cores are used. Default is -1.

Returns:

Imputed data matrix. Return type matches input type: - For numpy arrays or csr_matrix, returns numpy array or csr_matrix. - For pandas DataFrame, returns pandas DataFrame. - For AnnData, stores result in the object and returns numpy array or csr_matrix.

Return type:

Union[pd.DataFrame, None, csr_matrix, np.ndarray]

palantir.utils.run_pca(data: DataFrame | AnnData, n_components: int = 300, use_hvg: bool = True, pca_key: str = 'X_pca') → Tuple[DataFrame, array] | NoneView on GitHub ¶

Run PCA on the data.

Parameters:

data (Union[pd.DataFrame, AnnData]) – Dataframe of cells X genes or AnnData object. Typically multi-scale space diffusion components.
n_components (int, optional) – Number of principal components. Default is 300.
use_hvg (bool, optional) – Whether to use highly variable genes only for PCA. Default is True.
pca_key (str, optional) – Key to store the PCA projections in obsm of data if it is a AnnData object. Default is ‘X_pca’.

Returns:

Tuple of PCA projections of the data and the explained variance. If AnnData is passed as data, the results are also written to the input object and None is returned.

Return type:

Union[Tuple[pd.DataFrame, np.array], None]