Real erythroid dataset
VelOT pipeline & imports
[1]:
import velot
import scanpy as sc
import matplotlib.pyplot as plt
Set-up parameters
Quick set-up parameters for cell-type cluster identification and velocity embedding projection as well as visualization configuration.
[2]:
figures_path = None
show = True
save = False
basis = "pca"
project_umap = True if basis == "pca" else False
clusters_key = "celltype"
Data loading & preprocessing
[3]:
# Load data
try:
adata = sc.read_h5ad("../../article/data/Gastrulation/erythroid_lineage.h5ad")
except:
print("No downloaded file found, retrieving from scvelo...")
adata = velot.datasets.erythroid()
# Preprocess
sc.pp.filter_cells(adata, min_counts=20)
sc.pp.filter_genes(adata, min_cells=10)
# velot.pp.pca(adata, n_pcs=20)
sc.pp.neighbors(adata, 20, use_rep="X_pca")
velot.pp.pseudotime(adata, root_cluster="Blood progenitors 1", cluster_key=clusters_key)
adata.obs["clusters_id"] = adata.obs[clusters_key].cat.codes
adata.obsm["X_pca"] = adata.obsm["X_pca"][:,:10]
We can have a quick overviw of the dataset and the pseudotime computed using DPT.
[15]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
velot.pl.dataset_overview_simple(adata, color=clusters_key,
title="Cell-type clusters", inframe=True,
show=False, ax=axes[0]
)
velot.pl.dataset_overview_simple(adata, color="dpt_pseudotime",
title="Diffusion pseudotime",
show=False, ax=axes[1]
)
plt.tight_layout()
plt.show()
Velocity field computation
Velocity computation can be done using the command velot.tl.velocity(). It will internally compute all the VelOT steps:
Build pairs of windows
Compute OT cost between windows
Smooth raw OT velocity vectors
Project the final field to UMAP embedding for visualization
[ ]:
velot.tl.velocity(
adata=adata,
basis=f"X_{basis}",
smooth=True,
n_clusters=1,
window_size=200,
min_window_size=20,
overlap_fraction=0,
# spatial_key="clusters_id",
tail_handling="drop", tail_threshold=20,
reg=0.2, lambda_time=1, n_epochs=100,
lambda_smooth=0.8, lambda_curl=0.8, lambda_divergence=0, k_smooth=30,
project_umap=project_umap
)
VelOT: Computing velocity field
Basis: X_pca (10D)
Cells: 9815
Smoothing: ON
[1/4] Building spatial-temporal windows...
Spatial clustering: 1 clusters via KMeans on X_pca (10D)
Built 48 window pairs across 1 clusters (skipped 0 small clusters)
Window size: 200, step: 200
[2/4] Computing OT velocity...
OT velocity computed: 9600 cells with velocity, 215 cells without (will be filled by smoothing)
[3/4] Smoothing velocity field...
Found torch compatible version. Running on cuda device
epoch 50/100: total=1.2697 reg=3.5762 smooth=0.0007 curl=0.0000
epoch 100/100: total=1.1336 reg=3.3612 smooth=0.0004 curl=0.0000
Smoothing complete: final total=1.1336
[4/4] Projecting to UMAP...
Velocity projected to UMAP (9815 cells)
Velocity projected to UMAP (9815 cells)
VelOT: Done.
AnnData object with n_obs × n_vars = 9815 × 14352
obs: 'sample', 'stage', 'sequencing.batch', 'theiler', 'celltype', 'n_counts', 'dpt_pseudotime', 'pseudotime', 'clusters_id', 'velot_spatial_cluster', 'velot_confidence'
var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'MURK_gene', 'Δm', 'scaled Δm', 'n_cells'
uns: 'celltype_colors', 'neighbors', 'iroot', 'diffmap_evals', 'velot_windows', 'velot_smoothing'
obsm: 'X_pca', 'X_umap', 'X_diffmap', 'velot_velocity_raw_pca', 'velot_velocity_pca', 'velot_velocity_raw_umap', 'velot_velocity_umap'
layers: 'spliced', 'unspliced'
obsp: 'distances', 'connectivities'
Visualization
There are multiple representations available in the pipeline. All of them are accessible via the velot.pl module.
Velocity field stream
[6]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
velot.pl.velocity_stream(adata, color=clusters_key,
title="VelOT stream velocity",
figsize=(5,5), show=False, ax=axes[0]
)
velot.pl.dataset_overview_simple(adata, color="velot_confidence",
title="VelOT velocity confidence",
show=False, ax=axes[1]
)
plt.tight_layout()
plt.show()
Raw and smoothed velocity vectors
[7]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
velot.pl.velocity_quiver(adata,
color=clusters_key, basis="umap", subsample=500,
velocity_key=f"velot_velocity_raw_umap",
title="VelOT raw velocity vectors",
show=False, ax=axes[0]
)
velot.pl.velocity_quiver(adata,
color=clusters_key, basis="umap", subsample=500,
velocity_key=f"velot_velocity_umap",
title="VelOT smooth velocity vectors",
show=False, ax=axes[1]
)
plt.tight_layout()
plt.show()
CBDir & ICCoh metrics
[ ]:
edges = [
('Blood progenitors 1', 'Blood progenitors 2'),
('Blood progenitors 2', 'Erythroid1'),
('Erythroid1', 'Erythroid2'),
('Erythroid2', 'Erythroid3')
]
results = velot.metrics.summary(adata,
cluster_edges=edges, cluster_key=clusters_key,
embedding_key=f"X_{basis}", velocity_key=f"velot_velocity_{basis}",
print_results=False
)
velot.pl.metric_summary(results,
orientation="vertical", layout="column",
figsize=(7,7), show=show
)
Further analysis
Windows construction for OT solve
[14]:
n_total_pairs = adata.uns["velot_windows"]["n_pairs"]
pairs_to_show = tuple([i*10 for i in range(n_total_pairs)])
velot.pl.windows(adata, basis="umap",
pairs_to_show=pairs_to_show[:4], ncols=2,
pair_title=True, figsize_per_panel=(5,5), show=show
)
Training curves for the smoothing step
[10]:
velot.pl.training_curves_single(adata, figsize=(5,5), show=show)