Real erythroid dataset

VelOT pipeline & imports

[1]:
import velot
import scanpy as sc

import matplotlib.pyplot as plt

Set-up parameters

Quick set-up parameters for cell-type cluster identification and velocity embedding projection as well as visualization configuration.

[2]:
figures_path = None
show = True
save = False
basis = "pca"
project_umap = True if basis == "pca" else False
clusters_key = "celltype"

Data loading & preprocessing

[3]:
# Load data
try:
    adata = sc.read_h5ad("../../article/data/Gastrulation/erythroid_lineage.h5ad")
except:
    print("No downloaded file found, retrieving from scvelo...")
    adata = velot.datasets.erythroid()

# Preprocess
sc.pp.filter_cells(adata, min_counts=20)
sc.pp.filter_genes(adata, min_cells=10)

# velot.pp.pca(adata, n_pcs=20)
sc.pp.neighbors(adata, 20, use_rep="X_pca")
velot.pp.pseudotime(adata, root_cluster="Blood progenitors 1", cluster_key=clusters_key)
adata.obs["clusters_id"] = adata.obs[clusters_key].cat.codes
adata.obsm["X_pca"] = adata.obsm["X_pca"][:,:10]

We can have a quick overviw of the dataset and the pseudotime computed using DPT.

[15]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))

velot.pl.dataset_overview_simple(adata, color=clusters_key,
    title="Cell-type clusters", inframe=True,
    show=False, ax=axes[0]
)
velot.pl.dataset_overview_simple(adata, color="dpt_pseudotime",
    title="Diffusion pseudotime",
    show=False, ax=axes[1]
)

plt.tight_layout()
plt.show()
../_images/tutorials_erythroid_tutorial_9_0.png

Velocity field computation

Velocity computation can be done using the command velot.tl.velocity(). It will internally compute all the VelOT steps:

  • Build pairs of windows

  • Compute OT cost between windows

  • Smooth raw OT velocity vectors

  • Project the final field to UMAP embedding for visualization

[ ]:
velot.tl.velocity(
    adata=adata,
    basis=f"X_{basis}",
    smooth=True,
    n_clusters=1,
    window_size=200,
    min_window_size=20,
    overlap_fraction=0,
    # spatial_key="clusters_id",
    tail_handling="drop", tail_threshold=20,
    reg=0.2, lambda_time=1, n_epochs=100,
    lambda_smooth=0.8, lambda_curl=0.8, lambda_divergence=0, k_smooth=30,
    project_umap=project_umap
)
VelOT: Computing velocity field
  Basis: X_pca (10D)
  Cells: 9815
  Smoothing: ON

[1/4] Building spatial-temporal windows...
  Spatial clustering: 1 clusters via KMeans on X_pca (10D)
  Built 48 window pairs across 1 clusters (skipped 0 small clusters)
  Window size: 200, step: 200

[2/4] Computing OT velocity...
  OT velocity computed: 9600 cells with velocity, 215 cells without (will be filled by smoothing)

[3/4] Smoothing velocity field...
Found torch compatible version. Running on cuda device
    epoch 50/100: total=1.2697 reg=3.5762 smooth=0.0007 curl=0.0000
    epoch 100/100: total=1.1336 reg=3.3612 smooth=0.0004 curl=0.0000
  Smoothing complete: final total=1.1336

[4/4] Projecting to UMAP...
  Velocity projected to UMAP (9815 cells)
  Velocity projected to UMAP (9815 cells)

VelOT: Done.
AnnData object with n_obs × n_vars = 9815 × 14352
    obs: 'sample', 'stage', 'sequencing.batch', 'theiler', 'celltype', 'n_counts', 'dpt_pseudotime', 'pseudotime', 'clusters_id', 'velot_spatial_cluster', 'velot_confidence'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand', 'MURK_gene', 'Δm', 'scaled Δm', 'n_cells'
    uns: 'celltype_colors', 'neighbors', 'iroot', 'diffmap_evals', 'velot_windows', 'velot_smoothing'
    obsm: 'X_pca', 'X_umap', 'X_diffmap', 'velot_velocity_raw_pca', 'velot_velocity_pca', 'velot_velocity_raw_umap', 'velot_velocity_umap'
    layers: 'spliced', 'unspliced'
    obsp: 'distances', 'connectivities'

Visualization

There are multiple representations available in the pipeline. All of them are accessible via the velot.pl module.

Velocity field stream

[6]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))

velot.pl.velocity_stream(adata, color=clusters_key,
    title="VelOT stream velocity",
    figsize=(5,5), show=False, ax=axes[0]
)
velot.pl.dataset_overview_simple(adata, color="velot_confidence",
    title="VelOT velocity confidence",
    show=False, ax=axes[1]
)

plt.tight_layout()
plt.show()
../_images/tutorials_erythroid_tutorial_16_0.png

Raw and smoothed velocity vectors

[7]:
fig, axes = plt.subplots(1, 2, figsize=(10, 5))

velot.pl.velocity_quiver(adata,
    color=clusters_key, basis="umap", subsample=500,
    velocity_key=f"velot_velocity_raw_umap",
    title="VelOT raw velocity vectors",
    show=False, ax=axes[0]
)
velot.pl.velocity_quiver(adata,
    color=clusters_key, basis="umap", subsample=500,
    velocity_key=f"velot_velocity_umap",
    title="VelOT smooth velocity vectors",
    show=False, ax=axes[1]
)

plt.tight_layout()
plt.show()
../_images/tutorials_erythroid_tutorial_18_0.png

CBDir & ICCoh metrics

[ ]:
edges = [
    ('Blood progenitors 1', 'Blood progenitors 2'),
    ('Blood progenitors 2', 'Erythroid1'),
    ('Erythroid1', 'Erythroid2'),
    ('Erythroid2', 'Erythroid3')
]
results = velot.metrics.summary(adata,
    cluster_edges=edges, cluster_key=clusters_key,
    embedding_key=f"X_{basis}", velocity_key=f"velot_velocity_{basis}",
    print_results=False
)
velot.pl.metric_summary(results,
    orientation="vertical", layout="column",
    figsize=(7,7), show=show
)
../_images/tutorials_erythroid_tutorial_20_0.png

Further analysis

Windows construction for OT solve

[14]:
n_total_pairs = adata.uns["velot_windows"]["n_pairs"]
pairs_to_show = tuple([i*10 for i in range(n_total_pairs)])
velot.pl.windows(adata, basis="umap",
    pairs_to_show=pairs_to_show[:4], ncols=2,
    pair_title=True, figsize_per_panel=(5,5), show=show
)
../_images/tutorials_erythroid_tutorial_23_0.png

Training curves for the smoothing step

[10]:
velot.pl.training_curves_single(adata, figsize=(5,5), show=show)
../_images/tutorials_erythroid_tutorial_25_0.png