Access and Analyze scanpy anndata Objects from a Manuscript

This guide provides steps to access and analyze the scanpy anndata objects associated with a recent manuscript. These objects are essential for computational biologists and data scientists working in genomics and related fields. There are three replicates available for download:

Each anndata object contains several elements crucial for comprehensive data analysis:

  1. .X: Filtered, normalized, and log-transformed count matrix.

  2. .raw: Original, filtered raw count matrix.

  3. .obsm['MAGIC_imputed_data']: Imputed count matrix using MAGIC algorithm.

  4. .obsm['tsne']: t-SNE maps (as presented in the manuscript), generated using scaled diffusion components.

  5. .obs['clusters']: Cell clustering information.

  6. .obs['palantir_pseudotime']: Cell pseudo-time ordering, as determined by Palantir.

  7. .obs['palantir_diff_potential']: Palantir-determined differentiation potential of cells.

  8. .obsm['palantir_branch_probs']: Probabilities of cells branching into different lineages, according to Palantir.

  9. .uns['palantir_branch_probs_cell_types']: Labels for Palantir branch probabilities.

  10. .uns['ct_colors']: Color codes for cell types, as used in the manuscript.

  11. .uns['cluster_colors']: Color codes for cell clusters, as used in the manuscript.

Python Code for Data Access:

[1]:
import scanpy as sc

# Read in the data, with backup URLs provided
adata_Rep1 = sc.read(
    "../data/human_cd34_bm_rep1.h5ad",
    backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad",
)
adata_Rep2 = sc.read(
    "../data/human_cd34_bm_rep2.h5ad",
    backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad",
)
adata_Rep3 = sc.read(
    "../data/human_cd34_bm_rep3.h5ad",
    backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad",
)
[2]:
adata_Rep1
[2]:
AnnData object with n_obs × n_vars = 5780 × 14651
    obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
    uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
    obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'
[3]:
adata_Rep2
[3]:
AnnData object with n_obs × n_vars = 6501 × 14913
    obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
    uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
    obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'
[4]:
adata_Rep3
[4]:
AnnData object with n_obs × n_vars = 12046 × 14044
    obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
    uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
    obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'

Converting anndata Objects to Seurat Objects Using R

For researchers working with R and Seurat, the process to convert anndata objects to Seurat objects involves the following steps:

  1. Set Up R Environment and Libraries:

    • Load the necessary libraries: Seurat and anndata.

  2. Download and Read the Data:

    • Use curl::curl_download to download the anndata from the provided URLs.

    • Read the data using the read_h5ad method from the anndata library.

  3. Create Seurat Objects:

    • Use the CreateSeuratObject function to convert the data into Seurat objects, incorporating counts and metadata from the anndata object.

    • Transfer additional data like tSNE embeddings, imputed gene expressions, and cell fate probabilities into the appropriate slots in the Seurat object.

R Code Snippet:

[ ]:
# this cell only exists to allow running R code inside this python notebook using a conda kernel
import sys
import os

# Get the path to the python executable
python_executable_path = sys.executable

# Extract the path to the environment from the path to the python executable
env_path = os.path.dirname(os.path.dirname(python_executable_path))

print(
    f"Conda env path: {env_path}\n"
    "Please make sure you have R installed in the conda environment."
)

os.environ['R_HOME'] = os.path.join(env_path, 'lib', 'R')

%load_ext rpy2.ipython
[6]:
%%R
library(Seurat)
library(anndata)

create_seurat <- function(url) {
  file_path <- sub("https://s3.amazonaws.com/dp-lab-data-public/palantir/", "../data/", url)
  if (!file.exists(file_path)) {
    curl::curl_download(url, file_path)
  }
  data <- read_h5ad(file_path)

  seurat_obj <- CreateSeuratObject(
    counts = t(data$X),
    meta.data = data$obs,
    project = "CD34+ Bone Marrow Cells"
  )
  tsne_data <- data$obsm[["tsne"]]
  rownames(tsne_data) <- rownames(data$obs)
  colnames(tsne_data) <- c("tSNE_1", "tSNE_2")
  seurat_obj[["tsne"]] <- CreateDimReducObject(
    embeddings = tsne_data,
    key = "tSNE_"
  )
  imputed_data <- t(data$obsm[["MAGIC_imputed_data"]])
  colnames(imputed_data) <- rownames(data$obs)
  rownames(imputed_data) <- rownames(data$var)
  seurat_obj[["MAGIC_imputed"]] <- CreateAssayObject(counts = imputed_data)
  fate_probs <- as.data.frame(data$obsm[["palantir_branch_probs"]])
  colnames(fate_probs) <- data$uns[["palantir_branch_probs_cell_types"]]
  rownames(fate_probs) <- rownames(data$obs)
  seurat_obj <- AddMetaData(seurat_obj, metadata = fate_probs)

  return(seurat_obj)
}

human_cd34_bm_Rep1 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad")
human_cd34_bm_Rep2 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad")
human_cd34_bm_Rep3 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad")
R[write to console]: Loading required package: SeuratObject

R[write to console]: Loading required package: sp

R[write to console]:
Attaching package: ‘SeuratObject’


R[write to console]: The following object is masked from ‘package:base’:

    intersect



    WARNING: The R package "reticulate" only fixed recently
    an issue that caused a segfault when used with rpy2:
    https://github.com/rstudio/reticulate/pull/1188
    Make sure that you use a version of that package that includes
    the fix.

R[write to console]:
Attaching package: ‘anndata’


R[write to console]: The following object is masked from ‘package:SeuratObject’:

    Layers


R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Data is of class matrix. Coercing to dgCMatrix.

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Data is of class matrix. Coercing to dgCMatrix.

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Data is of class matrix. Coercing to dgCMatrix.

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

R[write to console]: Warning:
R[write to console]:  Feature names cannot have underscores ('_'), replacing with dashes ('-')

[7]:
%%R

human_cd34_bm_Rep1
An object of class Seurat
29302 features across 5780 samples within 2 assays
Active assay: RNA (14651 features, 0 variable features)
 1 layer present: counts
 1 other assay present: MAGIC_imputed
 1 dimensional reduction calculated: tsne
[8]:
%%R

human_cd34_bm_Rep2
An object of class Seurat
29826 features across 6501 samples within 2 assays
Active assay: RNA (14913 features, 0 variable features)
 1 layer present: counts
 1 other assay present: MAGIC_imputed
 1 dimensional reduction calculated: tsne
[9]:
%%R

human_cd34_bm_Rep3
An object of class Seurat
28088 features across 12046 samples within 2 assays
Active assay: RNA (14044 features, 0 variable features)
 1 layer present: counts
 1 other assay present: MAGIC_imputed
 1 dimensional reduction calculated: tsne
[ ]: