DPT

[1]:
# load modules
import os
import numpy as np
import pandas as pd
import pickle

# Plotting imports
import matplotlib
import matplotlib.pyplot as plt

import palantir
import scanpy as sc
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Lato'] not found. Falling back to DejaVu Sans.
[3]:
%matplotlib inline

Download data

Anndata objects with all the data and metadata are publically avaiable at: https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep[1-3].h5ad. This notebook use replicate 1 (https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad) for illustration.

Description of the anndata object is available at https://s3.amazonaws.com/dp-lab-data-public/palantir/readme

Load data

[4]:
# Load the AnnData object
ad = sc.read('annadata/human_cd34_bm_rep1.h5ad')
colors = pd.Series(ad.uns['cluster_colors'])
ct_colors = pd.Series(ad.uns['ct_colors'])

DPT

[5]:
# Set the start / root cell
ad.uns['iroot'] = np.flatnonzero(ad.obs_names == ad.obs['palantir_pseudotime'].idxmin())[0]
[6]:
# PCA, tSNE, diffusion maps and DPT
sc.pp.pca(ad, n_comps=300)
sc.tl.tsne(ad);
sc.pp.neighbors(ad, 50)
sc.tl.diffmap(ad, 10)
sc.tl.dpt(ad, n_dcs=10, n_branchings=3, copy=False)
Note that scikit-learn's randomized PCA might not be exactly reproducible across different computational platforms. For exact reproducibility, choose `svd_solver='arpack'.` This will likely become the Scanpy default in the future.
computing PCA with n_comps = 300
    finished (0:00:04.45)
    and added
    'X_pca', the PCA coordinates (adata.obs)
    'PC1', 'PC2', ..., the loadings (adata.var)
    'pca_variance', the variance / eigenvalues (adata.uns)
    'pca_variance_ratio', the variance ratio (adata.uns)
computing tSNE
    using 'X_pca' with n_pcs = 300
    using the 'MulticoreTSNE' package by Ulyanov (2017)
    finished (0:00:51.46) --> added
    'X_tsne', tSNE coordinates (adata.obsm)
computing neighbors
    using 'X_pca' with n_pcs = 300
    computed neighbors (0:00:00.72)
    computed connectivities (0:00:06.74)
    finished (0:00:00.02) --> added to `.uns['neighbors']`
    'distances', distances for each pair of neighbors
    'connectivities', weighted adjacency matrix
computing Diffusion Maps using n_comps=10(=n_dcs)
        initialized `.distances` `.connectivities`
    computed transitions (0:00:00.04)
    eigenvalues of transition matrix
    [1.         0.9849899  0.97812945 0.9549652  0.9394395  0.92903304
     0.9068288  0.90428394 0.8509435  0.8371804 ]
    finished (0:00:00.25) --> added
    'X_diffmap', diffmap coordinates (adata.obsm)
    'diffmap_evals', eigenvalues of transition matrix (adata.uns)
        initialized `.distances` `.connectivities` `.eigen_values` `.eigen_basis` `.distances_dpt`
computing Diffusion Pseudotime using n_dcs=10
    this uses a hierarchical implementation
        detect 3 branchings
        do not consider groups with less than 57 points for splitting
        group 0 score 1.555689 n_points 5780
        branching 1: split group 0
        group 0 score 1.3113303 n_points 63
        group 1 score 1.6766814 n_points 1196
        group 2 score 1.1366823 n_points 562
        group 3 score 1.6420908 n_points 2799
        branching 2: split group 1
        group 0 score 1.3113303 n_points 63
        group 1 score 0.26748806 n_points 5 (too small)
        group 2 score 1.1366823 n_points 562
        group 3 score 1.6420908 n_points 2799
        group 4 score 1.2083104 n_points 970
        group 5 score 1.333738 n_points 151
        group 6 score 1.0511937 n_points 47 (too small)
        branching 3: split group 3
    finished (0:00:04.93) --> added
    'dpt_pseudotime', the pseudotime (adata.obs)
    'dpt_groups', the branching subgroups of dpt (adata.obs)
    'dpt_order', cell order (adata.obs)

Plot below shows the tSNE map colored using the same color scheme shown in Fig 2

[7]:
plt.scatter(ad.obsm['X_tsne'][:, 0], ad.obsm['X_tsne'][:, 1],
           s=3, color=colors[ad.obs['clusters']])
ax = plt.gca()
ax.set_axis_off()
../../_images/notebooks_comparisons_dpt_11_0.png

Pseudotime

[8]:
plt.scatter(ad.obsm['X_tsne'][:, 0], ad.obsm['X_tsne'][:, 1],
           s=3, c=ad.obs['dpt_pseudotime'], cmap=matplotlib.cm.plasma)
ax = plt.gca()
ax.set_axis_off()
../../_images/notebooks_comparisons_dpt_13_0.png

Branches

Branches identified by DPT

[9]:
branches = ad.obs['dpt_groups'].unique()
dpt_colors = matplotlib.colormaps["hls"](range(len(branches)))
dpt_colors = pd.Series([matplotlib.colors.rgb2hex(rgba) for rgba in dpt_colors], index = branches)
[13]:
plt.scatter(ad.obsm['X_tsne'][:, 0], ad.obsm['X_tsne'][:, 1],
           s=3, color=dpt_colors[ad.obs['dpt_groups'].values])
ax = plt.gca()
ax.set_axis_off()
../../_images/notebooks_comparisons_dpt_17_0.png