Access and Analyze scanpy anndata Objects from a Manuscript¶
This guide provides steps to access and analyze the scanpy anndata objects associated with a recent manuscript. These objects are essential for computational biologists and data scientists working in genomics and related fields. There are three replicates available for download:
Each anndata object contains several elements crucial for comprehensive data analysis:
.X: Filtered, normalized, and log-transformed count matrix..raw: Original, filtered raw count matrix..obsm['MAGIC_imputed_data']: Imputed count matrix using MAGIC algorithm..obsm['tsne']: t-SNE maps (as presented in the manuscript), generated using scaled diffusion components..obs['clusters']: Cell clustering information..obs['palantir_pseudotime']: Cell pseudo-time ordering, as determined by Palantir..obs['palantir_diff_potential']: Palantir-determined differentiation potential of cells..obsm['palantir_branch_probs']: Probabilities of cells branching into different lineages, according to Palantir..uns['palantir_branch_probs_cell_types']: Labels for Palantir branch probabilities..uns['ct_colors']: Color codes for cell types, as used in the manuscript..uns['cluster_colors']: Color codes for cell clusters, as used in the manuscript.
Python Code for Data Access:¶
[1]:
import scanpy as sc
# Read in the data, with backup URLs provided
adata_Rep1 = sc.read(
"../data/human_cd34_bm_rep1.h5ad",
backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad",
)
adata_Rep2 = sc.read(
"../data/human_cd34_bm_rep2.h5ad",
backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad",
)
adata_Rep3 = sc.read(
"../data/human_cd34_bm_rep3.h5ad",
backup_url="https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad",
)
[2]:
adata_Rep1
[2]:
AnnData object with n_obs × n_vars = 5780 × 14651
obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'
[3]:
adata_Rep2
[3]:
AnnData object with n_obs × n_vars = 6501 × 14913
obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'
[4]:
adata_Rep3
[4]:
AnnData object with n_obs × n_vars = 12046 × 14044
obs: 'clusters', 'palantir_pseudotime', 'palantir_diff_potential'
uns: 'cluster_colors', 'ct_colors', 'palantir_branch_probs_cell_types'
obsm: 'tsne', 'MAGIC_imputed_data', 'palantir_branch_probs'
Converting anndata Objects to Seurat Objects Using R¶
For researchers working with R and Seurat, the process to convert anndata objects to Seurat objects involves the following steps:
Set Up R Environment and Libraries:
Load the necessary libraries:
Seuratandanndata.
Download and Read the Data:
Use
curl::curl_downloadto download theanndatafrom the provided URLs.Read the data using the
read_h5admethod from theanndatalibrary.
Create Seurat Objects:
Use the
CreateSeuratObjectfunction to convert the data into Seurat objects, incorporating counts and metadata from theanndataobject.Transfer additional data like tSNE embeddings, imputed gene expressions, and cell fate probabilities into the appropriate slots in the Seurat object.
R Code Snippet:
[ ]:
# this cell only exists to allow running R code inside this python notebook using a conda kernel
import sys
import os
# Get the path to the python executable
python_executable_path = sys.executable
# Extract the path to the environment from the path to the python executable
env_path = os.path.dirname(os.path.dirname(python_executable_path))
print(
f"Conda env path: {env_path}\n"
"Please make sure you have R installed in the conda environment."
)
os.environ['R_HOME'] = os.path.join(env_path, 'lib', 'R')
%load_ext rpy2.ipython
[6]:
%%R
library(Seurat)
library(anndata)
create_seurat <- function(url) {
file_path <- sub("https://s3.amazonaws.com/dp-lab-data-public/palantir/", "../data/", url)
if (!file.exists(file_path)) {
curl::curl_download(url, file_path)
}
data <- read_h5ad(file_path)
seurat_obj <- CreateSeuratObject(
counts = t(data$X),
meta.data = data$obs,
project = "CD34+ Bone Marrow Cells"
)
tsne_data <- data$obsm[["tsne"]]
rownames(tsne_data) <- rownames(data$obs)
colnames(tsne_data) <- c("tSNE_1", "tSNE_2")
seurat_obj[["tsne"]] <- CreateDimReducObject(
embeddings = tsne_data,
key = "tSNE_"
)
imputed_data <- t(data$obsm[["MAGIC_imputed_data"]])
colnames(imputed_data) <- rownames(data$obs)
rownames(imputed_data) <- rownames(data$var)
seurat_obj[["MAGIC_imputed"]] <- CreateAssayObject(counts = imputed_data)
fate_probs <- as.data.frame(data$obsm[["palantir_branch_probs"]])
colnames(fate_probs) <- data$uns[["palantir_branch_probs_cell_types"]]
rownames(fate_probs) <- rownames(data$obs)
seurat_obj <- AddMetaData(seurat_obj, metadata = fate_probs)
return(seurat_obj)
}
human_cd34_bm_Rep1 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep1.h5ad")
human_cd34_bm_Rep2 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep2.h5ad")
human_cd34_bm_Rep3 <- create_seurat("https://s3.amazonaws.com/dp-lab-data-public/palantir/human_cd34_bm_rep3.h5ad")
R[write to console]: Loading required package: SeuratObject
R[write to console]: Loading required package: sp
R[write to console]:
Attaching package: ‘SeuratObject’
R[write to console]: The following object is masked from ‘package:base’:
intersect
WARNING: The R package "reticulate" only fixed recently
an issue that caused a segfault when used with rpy2:
https://github.com/rstudio/reticulate/pull/1188
Make sure that you use a version of that package that includes
the fix.
R[write to console]:
Attaching package: ‘anndata’
R[write to console]: The following object is masked from ‘package:SeuratObject’:
Layers
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
R[write to console]: Warning:
R[write to console]: Data is of class matrix. Coercing to dgCMatrix.
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
R[write to console]: Warning:
R[write to console]: Data is of class matrix. Coercing to dgCMatrix.
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
R[write to console]: Warning:
R[write to console]: Data is of class matrix. Coercing to dgCMatrix.
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
R[write to console]: Warning:
R[write to console]: Feature names cannot have underscores ('_'), replacing with dashes ('-')
[7]:
%%R
human_cd34_bm_Rep1
An object of class Seurat
29302 features across 5780 samples within 2 assays
Active assay: RNA (14651 features, 0 variable features)
1 layer present: counts
1 other assay present: MAGIC_imputed
1 dimensional reduction calculated: tsne
[8]:
%%R
human_cd34_bm_Rep2
An object of class Seurat
29826 features across 6501 samples within 2 assays
Active assay: RNA (14913 features, 0 variable features)
1 layer present: counts
1 other assay present: MAGIC_imputed
1 dimensional reduction calculated: tsne
[9]:
%%R
human_cd34_bm_Rep3
An object of class Seurat
28088 features across 12046 samples within 2 assays
Active assay: RNA (14044 features, 0 variable features)
1 layer present: counts
1 other assay present: MAGIC_imputed
1 dimensional reduction calculated: tsne
[ ]: