PAGA

[1]:
# load modules
import os
import numpy as np
import pandas as pd
import pickle

# Plotting imports
import matplotlib
import matplotlib.pyplot as plt

import palantir
import scanpy as sc
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
[3]:
%matplotlib inline

Download data

Anndata objects with all the data and metadata are publically avaiable at: https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep[1-3].h5ad. This notebook use replicate 1 (https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad) for illustration.

Description of the anndata object is available at https://s3.amazonaws.com/dp-lab-data-public/palantir/readme

Load data

[4]:
# Load the AnnData object
load_ad = sc.read('annadata/human_cd34_bm_rep1.h5ad')
colors = pd.Series(load_ad.uns['cluster_colors'])
ct_colors = pd.Series(load_ad.uns['ct_colors'])

PAGA

[5]:
# Start from the counts
ad = sc.AnnData(load_ad.raw.X)
ad.obs_names = load_ad.obs_names
ad.var_names = load_ad.var_names
[6]:
# Preprocess
sc.pp.recipe_zheng17(ad, log=True)
sc.tl.pca(ad)
sc.pp.neighbors(ad, n_neighbors=60)
sc.tl.draw_graph(ad)
sc.tl.louvain(ad, resolution=1.2)
running recipe zheng17
filtered out 27 genes that are detected in less than 1 counts
    normalizing by total count per cell
        finished (0:00:00.24): normalized adata.X and added
        'n_counts_all', counts per cell before normalization (adata.obs)
    extracting highly variable genes
    the 1000 top genes correspond to a normalized dispersion cutoff of
        finished (0:00:00.95)
    normalizing by total count per cell
        finished (0:00:00.02): normalized adata.X and added
        'n_counts', counts per cell before normalization (adata.obs)
    ... scale_data: as `zero_center=True`, sparse input is densified and may lead to large memory consumption
    finished (0:00:00.02)
Note that scikit-learn's randomized PCA might not be exactly reproducible across different computational platforms. For exact reproducibility, choose `svd_solver='arpack'.` This will likely become the Scanpy default in the future.
computing PCA with n_comps = 50
    finished (0:00:00.16)
    and added
    'X_pca', the PCA coordinates (adata.obs)
    'PC1', 'PC2', ..., the loadings (adata.var)
    'pca_variance', the variance / eigenvalues (adata.uns)
    'pca_variance_ratio', the variance ratio (adata.uns)
computing neighbors
    using 'X_pca' with n_pcs = 50
    computed neighbors (0:00:00.83)
    computed connectivities (0:00:06.78)
    finished (0:00:00.02) --> added to `.uns['neighbors']`
    'distances', distances for each pair of neighbors
    'connectivities', weighted adjacency matrix
drawing single-cell graph using layout "fa"
    finished (0:00:44.00) --> added
    'X_draw_graph_fa', graph_drawing coordinates (adata.obsm)
running Louvain clustering
    using the "louvain" package of Traag (2017)
    finished (0:00:01.45) --> found 10 clusters and added
    'louvain', the cluster labels (adata.obs, categorical)

Graph abstraction

[7]:
ax = sc.pl.draw_graph(ad, color=['louvain'], legend_loc='on data')
../../_images/notebooks_comparisons_paga_11_0.png
[8]:
def hex_to_rgb(value):
    value = value.lstrip('#')
    lv = len(value)
    return tuple(int(value[i:i + lv // 3], 16)/255 for i in range(0, lv, lv // 3))

[9]:
# Palantir colors for comparison
ad.uns['louvain_colors'][0] = hex_to_rgb(colors[0])
ad.uns['louvain_colors'][3] = hex_to_rgb(colors[2])
ad.uns['louvain_colors'][5] = hex_to_rgb(colors[8])
ad.uns['louvain_colors'][7] = hex_to_rgb(colors[5])
ad.uns['louvain_colors'][1] = hex_to_rgb(colors[1])
ad.uns['louvain_colors'][4] = hex_to_rgb(colors[4])
ad.uns['louvain_colors'][6] = hex_to_rgb(colors[3])
ad.uns['louvain_colors'][9] = hex_to_rgb(colors[7])
[10]:
ax = sc.pl.draw_graph(ad, color=['louvain'], legend_loc='on data')
../../_images/notebooks_comparisons_paga_14_0.png
[11]:
sc.tl.paga(ad, groups='louvain')
running PAGA
        initialized `.distances` `.connectivities`
    finished (0:00:00.46) --> added
    'paga/connectivities', connectivities adjacency (adata.uns)
    'paga/connectivities_tree', connectivities subtree (adata.uns)