Tools (velot.tl)

Core analysis functions for the VelOT pipeline.

Follows the scanpy/scvelo convention: functions modify adata in place and return it for optional chaining.

Quick usage:

import velot
velot.tl.velocity(adata)   # runs the full pipeline

Step-by-step usage:

velot.tl.build_windows(adata)
velot.tl.compute_ot_velocity(adata)
velot.tl.smooth_velocity(adata)
velot.tl.project_to_umap(adata)
velot.tl.build_windows(adata, basis='X_pca', n_clusters=None, window_size=None, overlap_fraction=0.5, min_window_size=20, spatial_key=None, tail_handling='force', tail_threshold=10, random_state=42)[source]

Build spatial-temporal windows for local OT velocity computation.

Instead of sorting all cells by pseudotime and creating global sequential windows (which fails when pseudotime does not correspond to spatial locality), this function:

  1. Clusters cells spatially in PCA space

  2. Within each cluster, sorts cells by pseudotime

  3. Creates overlapping temporal windows within each cluster

  4. Pairs consecutive windows for OT computation

This ensures OT is only computed between spatially nearby cells, even when cells of very different pseudotime coexist in the same region of the embedding.

Parameters:
  • adata (AnnData) – Must contain adata.obsm[basis] and adata.obs['pseudotime'].

  • basis (str) – Key in adata.obsm for the embedding to cluster in.

  • n_clusters (Optional[int]) – Number of spatial clusters. If None, chosen automatically based on dataset size.

  • window_size (Optional[int]) – Number of cells per temporal window. If None, chosen automatically.

  • overlap_fraction (float) – Fraction of overlap between consecutive temporal windows within each cluster. 0.5 means 50% overlap.

  • min_window_size (int) – Minimum number of cells to form a valid window. Clusters smaller than 2 * min_window_size are skipped.

  • spatial_key (Optional[str]) – Column in adata.obs with precomputed spatial cluster labels. If provided, skips KMeans clustering.

  • random_state (int) – Random seed for KMeans.

  • tail_handling (str)

  • tail_threshold (int)

Returns:

  • adata.obs['velot_spatial_cluster'] : spatial cluster labels

  • adata.uns['velot_windows'] : dict with window pair info

Return type:

adata, modified in place with

velot.tl.compute_ot_velocity(adata, basis='X_pca', reg=0.05, lambda_time=1.0, lambda_knn=1.0)[source]

Compute raw OT velocity from spatial-temporal windows.

For each window pair, Sinkhorn optimal transport is used to match cells in the source window to cells in the target window. The velocity for each cell is the weighted displacement under the transport plan.

Results are aggregated across all window pairs where a cell appears as a source. A per-cell confidence score tracks how many windows contributed to each cell’s velocity estimate.

Parameters:
  • adata (AnnData) – Must contain windows from velot.tl.build_windows().

  • basis (str) – Embedding key in adata.obsm.

  • reg (float) – Sinkhorn entropy regularization. Smaller = sharper plans.

  • lambda_time (float) – Penalty added to cost for backward-in-time transport.

  • lambda_knn (float) – Penalty added to cost for transport between non-neighbors.

Returns:

  • adata.obsm['velot_velocity'] : raw OT velocity (PCA space)

  • adata.obs['velot_confidence'] : contribution count per cell

Return type:

adata, modified in place with

velot.tl.smooth_velocity(adata, basis='pca', velocity_key='velot_velocity_raw', n_epochs=200, hidden_dim=128, lr=0.001, batch_size=256, lambda_smooth=0.5, lambda_curl=0.5, lambda_divergence=0.0, k_smooth=15, use_pseudotime=True, random_state=42, verbose=True)[source]

Smooth the raw OT velocity field using a neural network.

Three types of regularization are available:

  • Smoothness (lambda_smooth): neighboring cells should have similar velocities. Reduces noise and zig-zagging.

  • Curl penalty (lambda_curl): penalizes rotational components of the velocity field. Encourages irrotational flow where streamlines do not form loops. This is the strongest constraint for preventing crossing field lines.

  • Divergence penalty (lambda_divergence): penalizes the divergence of the velocity field. Encourages incompressible-like flow where cells neither accumulate nor deplete locally.

Parameters:
  • adata (AnnData) – Must contain adata.obsm['velot_velocity'].

  • basis (str) – Embedding key for cell coordinates.

  • n_epochs (int) – Training epochs.

  • hidden_dim (int) – Network hidden layer width.

  • lr (float) – Learning rate.

  • batch_size (int) – Cells per batch.

  • lambda_smooth (float) – Weight of KNN smoothness loss.

  • lambda_curl (float) – Weight of curl penalty. Higher values produce flow with fewer crossing streamlines. Recommended range: 0.0 (off) to 0.5.

  • lambda_divergence (float) – Weight of divergence penalty. Higher values produce more volume-preserving flow. Recommended range: 0.0 (off) to 0.5.

  • k_smooth (int) – Number of neighbors for smoothness.

  • use_pseudotime (bool) – Condition network on pseudotime.

  • random_state (int) – Random seed.

  • verbose (bool) – Print progress.

  • velocity_key (str)

Return type:

adata with smoothed velocity.

velot.tl.query_velocity(adata, positions, pseudotime_values=None)[source]

Query the learned continuous velocity field at arbitrary positions.

Unlike the stored velocity vectors in adata.obsm['velot_velocity'] which are only defined at cell locations, this function evaluates the trained neural velocity field at any point in PCA space.

This is what makes VelOT a true velocity FIELD rather than just a set of velocity vectors: given any point on the manifold, you can ask “what is the velocity here?”

Parameters:
  • positions (ndarray) – Array of shape (n_points, D) with PCA coordinates to query.

  • pseudotime_values (Optional[ndarray]) – Array of shape (n_points,) with pseudotime values for each query point. If None, uses the median pseudotime from the dataset.

  • adata (AnnData)

Return type:

ndarray

Returns:

  • Array of shape (n_points, D) with velocity vectors at each

  • query position.

Example

# Query at cell positions (same as stored vectors)
V = velot.tl.query_velocity(adata, adata.obsm["X_pca"])

# Query at arbitrary positions
import numpy as np
grid_points = np.random.randn(100, 30) * 0.1
V_grid = velot.tl.query_velocity(adata, grid_points)
velot.tl.project_to_embedding(adata, velocity_key='velot_velocity_pca', velocity_key_umap='velot_velocity_umap', basis_pca='X_pca', basis_embedding='X_umap', n_neighbors=30)[source]

Project velocity from PCA space to any 2D embedding space for visualization.

Uses a local linear approximation: for each cell, the Jacobian of the PCA→2D (UMAP, TSNE…) mapping is estimated from its KNN neighborhood via least-squares, and the PCA velocity is transformed accordingly.

Note: UMAP is for visualization only. The velocity in PCA space (adata.obsm['velot_velocity']) is the primary output.

Parameters:
  • adata (AnnData) – Must contain PCA and UMAP embeddings, and computed velocity.

  • velocity_key (str)

  • velocity_key_umap (str)

  • basis_pca (str)

  • basis_embedding (str)

  • n_neighbors (int)

Return type:

adata with adata.obsm['velocity_umap'] populated.

velot.tl.project_to_umap(adata, velocity_key='velot_velocity_pca', velocity_key_umap='velot_velocity_umap', basis_pca='X_pca', basis_umap='X_umap', n_neighbors=30)[source]

Project velocity from PCA space to UMAP space for visualization.

Uses a local linear approximation: for each cell, the Jacobian of the PCA→UMAP mapping is estimated from its KNN neighborhood via least-squares, and the PCA velocity is transformed accordingly.

Note: UMAP is for visualization only. The velocity in PCA space (adata.obsm['velot_velocity']) is the primary output.

Parameters:
  • adata (AnnData) – Must contain PCA and UMAP embeddings, and computed velocity.

  • velocity_key (str)

  • velocity_key_umap (str)

  • basis_pca (str)

  • basis_umap (str)

  • n_neighbors (int)

Return type:

adata with adata.obsm['velocity_umap'] populated.

velot.tl.velocity(adata, basis='X_pca', smooth=True, n_clusters=None, window_size=None, overlap_fraction=0.5, min_window_size=20, spatial_key=None, tail_handling='force', tail_threshold=10, reg=0.05, lambda_time=1.0, lambda_knn=1.0, n_epochs=200, hidden_dim=128, lambda_smooth=0.5, lambda_curl=0.0, lambda_divergence=0.0, k_smooth=15, use_pseudotime=True, project_umap=True, project_basis='X_umap', random_state=42, verbose=True)[source]

Run the full VelOT velocity pipeline.

This is a convenience function that calls, in order:
  1. build_windows() — spatial-temporal windowing

  2. compute_ot_velocity() — local OT velocity

  3. smooth_velocity() — neural smoothing (optional)

  4. project_to_umap() — PCA → UMAP projection (optional)

Parameters:
  • adata (AnnData) – Preprocessed AnnData (run velot.pp.prepare() first).

  • basis (str) – Embedding key for velocity computation.

  • smooth (bool) – Whether to apply neural smoothing.

  • project_umap (bool) – Whether to project velocity to UMAP for visualization.

  • verbose (bool) – Whether to print progress.

  • n_clusters (int | None)

  • window_size (int | None)

  • overlap_fraction (float)

  • min_window_size (int)

  • spatial_key (str | None)

  • tail_handling (str)

  • tail_threshold (int)

  • reg (float)

  • lambda_time (float)

  • lambda_knn (float)

  • n_epochs (int)

  • hidden_dim (int)

  • lambda_smooth (float)

  • lambda_curl (float)

  • lambda_divergence (float)

  • k_smooth (int)

  • use_pseudotime (bool)

  • project_basis (str)

  • random_state (int)

Return type:

adata with velocity fields computed.

Example

import velot
import scvelo as scv

adata = scv.datasets.pancreas()
velot.pp.prepare(adata, root_cluster="Ductal")
velot.tl.velocity(adata)
velot.pl.velocity_stream(adata)
velot.tl.gridsearch(adata, param_grid, cluster_edges=None, cluster_key='clusters', velocity_metric_key='velocity_umap', fixed_params=None, pseudotime_key='pseudotime', verbose=True)[source]

Run the VelOT pipeline across multiple parameter combinations and collect evaluation metrics for each.

Parameters:
  • adata (AnnData) – Preprocessed AnnData. Must already have PCA, neighbors, UMAP, and pseudotime computed (everything from velot.pp). A fresh copy is used for each parameter combination.

  • param_grid (dict) –

    Dictionary mapping parameter names to lists of values. Parameter names must match arguments of velot.tl.velocity(). Example:

    param_grid = {
        "basis": ["X_pca"],
        "reg": [0.01, 0.05, 0.1],
        "lambda_smooth": [0.1, 0.5, 1.0],
        "n_clusters": [10, 20],
    }
    

  • cluster_edges (Optional[list]) – Transition edges for CBDir metric. If None, only ICCoh is computed.

  • cluster_key (str) – Column in adata.obs with cluster labels.

  • velocity_metric_key (str) – Key in adata.obsm to evaluate metrics on.

  • fixed_params (Optional[dict]) –

    Parameters passed to velot.tl.velocity() for every run that are NOT part of the grid. Example:

    fixed_params = {"smooth": True, "n_epochs": 200}
    

  • pseudotime_key (str) – Column in adata.obs with pseudotime. Used to verify it exists before running.

  • verbose (bool) – Whether to print progress.

Returns:

  • pandas DataFrame with one row per parameter combination and

  • columns for each parameter, ICCoh mean, CBDir mean, and

  • per-cluster/per-edge scores.

Example

import velot

adata = velot.datasets.dentategyrus()
# ... preprocessing ...

param_grid = {
    "reg": [0.01, 0.05, 0.1, 0.5],
    "lambda_smooth": [0.1, 0.5, 1.0],
    "n_clusters": [10, 20, 30],
}

edges = [
    ("OPC", "OL"),
    ("Neuroblast", "Granule immature"),
]

results = velot.tl.gridsearch(
    adata,
    param_grid,
    cluster_edges=edges,
    cluster_key="clusters",
    fixed_params={"basis": "X_pca", "n_epochs": 200},
)

# Best by ICCoh
print(results.sort_values("iccoh_mean", ascending=False).head())

# Best by CBDir
print(results.sort_values("cbdir_mean", ascending=False).head())
velot.tl.compute_trajectories(adata, start_cells=None, start_cluster=None, target_cluster=None, start_pseudotime=None, end_pseudotime=None, n_trajectories=20, n_steps=200, step_size=0.05, direction='forward', velocity_key='velot_velocity_pca', basis='X_pca', cluster_key='clusters', k_velocity=15, use_network=True, evolve_pseudotime=True, max_attempts_factor=10, random_state=42)[source]

Compute cell trajectories by integrating the velocity field.

Supports three modes:
  • "forward": follow the flow from starting cells.

  • "backward": go against the flow to trace origins.

  • "both": compute both directions.

Starting cells can be selected by:
  • Explicit indices (start_cells).

  • Cluster name (start_cluster).

  • Pseudotime range (start_pseudotime / end_pseudotime).

  • Any combination: cluster + pseudotime range narrows the selection.

When target_cluster is specified, only trajectories that reach the target are kept. The function will attempt up to n_trajectories * max_attempts_factor integrations to find enough successful trajectories.

Parameters:
  • adata (AnnData) – Must have velocity computed.

  • start_cells (Optional[ndarray]) – Array of cell indices to start from. If provided, start_cluster and pseudotime range are ignored.

  • start_cluster (Optional[str]) – Cluster name to select starting cells from. Can be combined with start_pseudotime / end_pseudotime to further narrow the selection.

  • target_cluster (Optional[str]) – If provided, only keep trajectories whose terminal cluster (forward) or origin cluster (backward) matches this value.

  • start_pseudotime (Optional[float]) – Lower bound of pseudotime for selecting starting cells. Defaults to None (no lower bound). Can be used alone or combined with start_cluster.

  • end_pseudotime (Optional[float]) – Upper bound of pseudotime for selecting starting cells. Defaults to None (no upper bound). Can be used alone or combined with start_cluster.

  • n_trajectories (int) – Number of successful trajectories to collect.

  • n_steps (int) – Maximum integration steps.

  • step_size (float) – Euler step size as fraction of mean velocity magnitude.

  • direction (str) – "forward", "backward", or "both".

  • velocity_key (str) – Key in adata.obsm with velocity vectors.

  • basis (str) – Embedding key for cell coordinates.

  • cluster_key (str) – Cluster annotation column.

  • k_velocity (int) – KNN for velocity interpolation (fallback mode).

  • use_network (bool) – If True and the smoothing network is available, query the continuous velocity field directly.

  • evolve_pseudotime (bool) – If True, pseudotime evolves along the trajectory.

  • max_attempts_factor (int) – When target_cluster is set, try up to n_trajectories * max_attempts_factor starting cells to find enough trajectories that reach the target.

  • random_state (int) – Random seed.

Return type:

AnnData

Returns:

  • adata with adata.uns['velot_trajectories'].

  • The stored metadata for each trajectory includes an "id"

  • field that can be used with velot.pl.trajectories(trajectory_ids=...)

  • to plot specific trajectories.

Examples

All backward trajectories from Alpha:

velot.tl.compute_trajectories(
    adata, start_cluster="Alpha", direction="backward",
)

Only backward trajectories from Alpha that reach Ngn3 low EP:

velot.tl.compute_trajectories(
    adata, start_cluster="Alpha", direction="backward",
    target_cluster="Ngn3 low EP", n_trajectories=5,
)

Forward from progenitors that reach Epsilon:

velot.tl.compute_trajectories(
    adata, start_cluster="Ngn3 low EP", direction="forward",
    target_cluster="Epsilon", n_trajectories=10,
)

Forward from early cells (pseudotime 0 to 0.1) regardless of cluster:

velot.tl.compute_trajectories(
    adata, start_pseudotime=0.0, end_pseudotime=0.1,
    direction="forward",
)

Backward from Alpha cells with pseudotime > 0.8:

velot.tl.compute_trajectories(
    adata, start_cluster="Alpha", start_pseudotime=0.8,
    direction="backward",
)
velot.tl.simulate_flow(adata, n_particles=200, source_cluster=None, source_pseudotime_min=0.0, source_pseudotime_max=0.1, source_x_lim=(-inf, inf), source_y_lim=(-inf, inf), n_steps=300, step_size=0.05, diffusion=0.5, basis='X_pca', cluster_key='clusters', noise_scale=0.0, random_state=42)[source]

Simulate a flow of particles through the learned velocity field.

Drops particles at the “top of the river” (low pseudotime or a source cluster) and integrates them forward through the velocity field with optional stochastic diffusion.

The integration follows a stochastic differential equation:

x_{n+1} = x_n + dt * v(x_n, τ_n) + sqrt(2 * D * dt) * η_n

where D is the diffusion coefficient and η is Gaussian noise. The diffusion allows particles starting from the same region to explore different branches at bifurcation points, producing realistic fate distributions.

Parameters:
  • adata (AnnData) – Must have the smoothing network trained.

  • n_particles (int) – Number of particles to simulate.

  • source_cluster (Optional[str]) – Cluster to seed particles from. If None, uses cells with lowest pseudotime.

  • source_pseudotime_max (float) – If source_cluster is None, seed from cells with pseudotime below this value.

  • n_steps (int) – Maximum number of integration steps.

  • step_size (float) – Euler step size as fraction of mean velocity magnitude.

  • diffusion (float) – Diffusion coefficient controlling stochasticity. 0.0 = deterministic (all trajectories converge). 0.1-0.5 = mild stochasticity (some branching exploration). 1.0+ = strong stochasticity (wide exploration). The noise is scaled relative to the local velocity magnitude so it is adaptive.

  • basis (str) – PCA embedding key.

  • cluster_key (str) – Cluster annotation column.

  • noise_scale (float) – Gaussian noise added to INITIAL positions only. Separate from diffusion which is added at every step.

  • random_state (int) – Random seed.

  • source_pseudotime_min (float)

  • source_x_lim (tuple)

  • source_y_lim (tuple)

Return type:

adata with adata.uns['velot_flow'].

Examples

Deterministic flow (trajectories will converge):

velot.tl.simulate_flow(adata, diffusion=0.0, ...)

Stochastic flow (trajectories explore branches):

velot.tl.simulate_flow(adata, diffusion=0.3, ...)

Strong diffusion (wide exploration):

velot.tl.simulate_flow(adata, diffusion=1.0, ...)