Tools (velot.tl)
Core analysis functions for the VelOT pipeline.
Follows the scanpy/scvelo convention: functions modify adata in place and return it for optional chaining.
Quick usage:
import velot
velot.tl.velocity(adata) # runs the full pipeline
Step-by-step usage:
velot.tl.build_windows(adata)
velot.tl.compute_ot_velocity(adata)
velot.tl.smooth_velocity(adata)
velot.tl.project_to_umap(adata)
- velot.tl.build_windows(adata, basis='X_pca', n_clusters=None, window_size=None, overlap_fraction=0.5, min_window_size=20, spatial_key=None, tail_handling='force', tail_threshold=10, random_state=42)[source]
Build spatial-temporal windows for local OT velocity computation.
Instead of sorting all cells by pseudotime and creating global sequential windows (which fails when pseudotime does not correspond to spatial locality), this function:
Clusters cells spatially in PCA space
Within each cluster, sorts cells by pseudotime
Creates overlapping temporal windows within each cluster
Pairs consecutive windows for OT computation
This ensures OT is only computed between spatially nearby cells, even when cells of very different pseudotime coexist in the same region of the embedding.
- Parameters:
adata (
AnnData) – Must containadata.obsm[basis]andadata.obs['pseudotime'].basis (
str) – Key inadata.obsmfor the embedding to cluster in.n_clusters (
Optional[int]) – Number of spatial clusters. If None, chosen automatically based on dataset size.window_size (
Optional[int]) – Number of cells per temporal window. If None, chosen automatically.overlap_fraction (
float) – Fraction of overlap between consecutive temporal windows within each cluster. 0.5 means 50% overlap.min_window_size (
int) – Minimum number of cells to form a valid window. Clusters smaller than 2 * min_window_size are skipped.spatial_key (
Optional[str]) – Column inadata.obswith precomputed spatial cluster labels. If provided, skips KMeans clustering.random_state (
int) – Random seed for KMeans.tail_handling (str)
tail_threshold (int)
- Returns:
adata.obs['velot_spatial_cluster']: spatial cluster labelsadata.uns['velot_windows']: dict with window pair info
- Return type:
adata, modified in place with
- velot.tl.compute_ot_velocity(adata, basis='X_pca', reg=0.05, lambda_time=1.0, lambda_knn=1.0)[source]
Compute raw OT velocity from spatial-temporal windows.
For each window pair, Sinkhorn optimal transport is used to match cells in the source window to cells in the target window. The velocity for each cell is the weighted displacement under the transport plan.
Results are aggregated across all window pairs where a cell appears as a source. A per-cell confidence score tracks how many windows contributed to each cell’s velocity estimate.
- Parameters:
adata (
AnnData) – Must contain windows fromvelot.tl.build_windows().basis (
str) – Embedding key inadata.obsm.reg (
float) – Sinkhorn entropy regularization. Smaller = sharper plans.lambda_time (
float) – Penalty added to cost for backward-in-time transport.lambda_knn (
float) – Penalty added to cost for transport between non-neighbors.
- Returns:
adata.obsm['velot_velocity']: raw OT velocity (PCA space)adata.obs['velot_confidence']: contribution count per cell
- Return type:
adata, modified in place with
- velot.tl.smooth_velocity(adata, basis='pca', velocity_key='velot_velocity_raw', n_epochs=200, hidden_dim=128, lr=0.001, batch_size=256, lambda_smooth=0.5, lambda_curl=0.5, lambda_divergence=0.0, k_smooth=15, use_pseudotime=True, random_state=42, verbose=True)[source]
Smooth the raw OT velocity field using a neural network.
Three types of regularization are available:
Smoothness (
lambda_smooth): neighboring cells should have similar velocities. Reduces noise and zig-zagging.Curl penalty (
lambda_curl): penalizes rotational components of the velocity field. Encourages irrotational flow where streamlines do not form loops. This is the strongest constraint for preventing crossing field lines.Divergence penalty (
lambda_divergence): penalizes the divergence of the velocity field. Encourages incompressible-like flow where cells neither accumulate nor deplete locally.
- Parameters:
adata (
AnnData) – Must containadata.obsm['velot_velocity'].basis (
str) – Embedding key for cell coordinates.n_epochs (
int) – Training epochs.hidden_dim (
int) – Network hidden layer width.lr (
float) – Learning rate.batch_size (
int) – Cells per batch.lambda_smooth (
float) – Weight of KNN smoothness loss.lambda_curl (
float) – Weight of curl penalty. Higher values produce flow with fewer crossing streamlines. Recommended range: 0.0 (off) to 0.5.lambda_divergence (
float) – Weight of divergence penalty. Higher values produce more volume-preserving flow. Recommended range: 0.0 (off) to 0.5.k_smooth (
int) – Number of neighbors for smoothness.use_pseudotime (
bool) – Condition network on pseudotime.random_state (
int) – Random seed.verbose (
bool) – Print progress.velocity_key (str)
- Return type:
adata with smoothed velocity.
- velot.tl.query_velocity(adata, positions, pseudotime_values=None)[source]
Query the learned continuous velocity field at arbitrary positions.
Unlike the stored velocity vectors in
adata.obsm['velot_velocity']which are only defined at cell locations, this function evaluates the trained neural velocity field at any point in PCA space.This is what makes VelOT a true velocity FIELD rather than just a set of velocity vectors: given any point on the manifold, you can ask “what is the velocity here?”
- Parameters:
- Return type:
- Returns:
Array of shape (n_points, D) with velocity vectors at each
query position.
Example
# Query at cell positions (same as stored vectors) V = velot.tl.query_velocity(adata, adata.obsm["X_pca"]) # Query at arbitrary positions import numpy as np grid_points = np.random.randn(100, 30) * 0.1 V_grid = velot.tl.query_velocity(adata, grid_points)
- velot.tl.project_to_embedding(adata, velocity_key='velot_velocity_pca', velocity_key_umap='velot_velocity_umap', basis_pca='X_pca', basis_embedding='X_umap', n_neighbors=30)[source]
Project velocity from PCA space to any 2D embedding space for visualization.
Uses a local linear approximation: for each cell, the Jacobian of the PCA→2D (UMAP, TSNE…) mapping is estimated from its KNN neighborhood via least-squares, and the PCA velocity is transformed accordingly.
Note: UMAP is for visualization only. The velocity in PCA space (
adata.obsm['velot_velocity']) is the primary output.
- velot.tl.project_to_umap(adata, velocity_key='velot_velocity_pca', velocity_key_umap='velot_velocity_umap', basis_pca='X_pca', basis_umap='X_umap', n_neighbors=30)[source]
Project velocity from PCA space to UMAP space for visualization.
Uses a local linear approximation: for each cell, the Jacobian of the PCA→UMAP mapping is estimated from its KNN neighborhood via least-squares, and the PCA velocity is transformed accordingly.
Note: UMAP is for visualization only. The velocity in PCA space (
adata.obsm['velot_velocity']) is the primary output.
- velot.tl.velocity(adata, basis='X_pca', smooth=True, n_clusters=None, window_size=None, overlap_fraction=0.5, min_window_size=20, spatial_key=None, tail_handling='force', tail_threshold=10, reg=0.05, lambda_time=1.0, lambda_knn=1.0, n_epochs=200, hidden_dim=128, lambda_smooth=0.5, lambda_curl=0.0, lambda_divergence=0.0, k_smooth=15, use_pseudotime=True, project_umap=True, project_basis='X_umap', random_state=42, verbose=True)[source]
Run the full VelOT velocity pipeline.
- This is a convenience function that calls, in order:
build_windows()— spatial-temporal windowingcompute_ot_velocity()— local OT velocitysmooth_velocity()— neural smoothing (optional)project_to_umap()— PCA → UMAP projection (optional)
- Parameters:
adata (
AnnData) – Preprocessed AnnData (runvelot.pp.prepare()first).basis (
str) – Embedding key for velocity computation.smooth (
bool) – Whether to apply neural smoothing.project_umap (
bool) – Whether to project velocity to UMAP for visualization.verbose (
bool) – Whether to print progress.n_clusters (int | None)
window_size (int | None)
overlap_fraction (float)
min_window_size (int)
spatial_key (str | None)
tail_handling (str)
tail_threshold (int)
reg (float)
lambda_time (float)
lambda_knn (float)
n_epochs (int)
hidden_dim (int)
lambda_smooth (float)
lambda_curl (float)
lambda_divergence (float)
k_smooth (int)
use_pseudotime (bool)
project_basis (str)
random_state (int)
- Return type:
adata with velocity fields computed.
Example
import velot import scvelo as scv adata = scv.datasets.pancreas() velot.pp.prepare(adata, root_cluster="Ductal") velot.tl.velocity(adata) velot.pl.velocity_stream(adata)
- velot.tl.gridsearch(adata, param_grid, cluster_edges=None, cluster_key='clusters', velocity_metric_key='velocity_umap', fixed_params=None, pseudotime_key='pseudotime', verbose=True)[source]
Run the VelOT pipeline across multiple parameter combinations and collect evaluation metrics for each.
- Parameters:
adata (
AnnData) – Preprocessed AnnData. Must already have PCA, neighbors, UMAP, and pseudotime computed (everything fromvelot.pp). A fresh copy is used for each parameter combination.param_grid (
dict) –Dictionary mapping parameter names to lists of values. Parameter names must match arguments of
velot.tl.velocity(). Example:param_grid = { "basis": ["X_pca"], "reg": [0.01, 0.05, 0.1], "lambda_smooth": [0.1, 0.5, 1.0], "n_clusters": [10, 20], }
cluster_edges (
Optional[list]) – Transition edges for CBDir metric. If None, only ICCoh is computed.cluster_key (
str) – Column in adata.obs with cluster labels.velocity_metric_key (
str) – Key in adata.obsm to evaluate metrics on.fixed_params (
Optional[dict]) –Parameters passed to
velot.tl.velocity()for every run that are NOT part of the grid. Example:fixed_params = {"smooth": True, "n_epochs": 200}
pseudotime_key (
str) – Column in adata.obs with pseudotime. Used to verify it exists before running.verbose (
bool) – Whether to print progress.
- Returns:
pandas DataFrame with one row per parameter combination and
columns for each parameter, ICCoh mean, CBDir mean, and
per-cluster/per-edge scores.
Example
import velot adata = velot.datasets.dentategyrus() # ... preprocessing ... param_grid = { "reg": [0.01, 0.05, 0.1, 0.5], "lambda_smooth": [0.1, 0.5, 1.0], "n_clusters": [10, 20, 30], } edges = [ ("OPC", "OL"), ("Neuroblast", "Granule immature"), ] results = velot.tl.gridsearch( adata, param_grid, cluster_edges=edges, cluster_key="clusters", fixed_params={"basis": "X_pca", "n_epochs": 200}, ) # Best by ICCoh print(results.sort_values("iccoh_mean", ascending=False).head()) # Best by CBDir print(results.sort_values("cbdir_mean", ascending=False).head())
- velot.tl.compute_trajectories(adata, start_cells=None, start_cluster=None, target_cluster=None, start_pseudotime=None, end_pseudotime=None, n_trajectories=20, n_steps=200, step_size=0.05, direction='forward', velocity_key='velot_velocity_pca', basis='X_pca', cluster_key='clusters', k_velocity=15, use_network=True, evolve_pseudotime=True, max_attempts_factor=10, random_state=42)[source]
Compute cell trajectories by integrating the velocity field.
- Supports three modes:
"forward": follow the flow from starting cells."backward": go against the flow to trace origins."both": compute both directions.
- Starting cells can be selected by:
Explicit indices (
start_cells).Cluster name (
start_cluster).Pseudotime range (
start_pseudotime/end_pseudotime).Any combination: cluster + pseudotime range narrows the selection.
When
target_clusteris specified, only trajectories that reach the target are kept. The function will attempt up ton_trajectories * max_attempts_factorintegrations to find enough successful trajectories.- Parameters:
adata (
AnnData) – Must have velocity computed.start_cells (
Optional[ndarray]) – Array of cell indices to start from. If provided,start_clusterand pseudotime range are ignored.start_cluster (
Optional[str]) – Cluster name to select starting cells from. Can be combined withstart_pseudotime/end_pseudotimeto further narrow the selection.target_cluster (
Optional[str]) – If provided, only keep trajectories whose terminal cluster (forward) or origin cluster (backward) matches this value.start_pseudotime (
Optional[float]) – Lower bound of pseudotime for selecting starting cells. Defaults toNone(no lower bound). Can be used alone or combined withstart_cluster.end_pseudotime (
Optional[float]) – Upper bound of pseudotime for selecting starting cells. Defaults toNone(no upper bound). Can be used alone or combined withstart_cluster.n_trajectories (
int) – Number of successful trajectories to collect.n_steps (
int) – Maximum integration steps.step_size (
float) – Euler step size as fraction of mean velocity magnitude.direction (
str) –"forward","backward", or"both".velocity_key (
str) – Key in adata.obsm with velocity vectors.basis (
str) – Embedding key for cell coordinates.cluster_key (
str) – Cluster annotation column.k_velocity (
int) – KNN for velocity interpolation (fallback mode).use_network (
bool) – If True and the smoothing network is available, query the continuous velocity field directly.evolve_pseudotime (
bool) – If True, pseudotime evolves along the trajectory.max_attempts_factor (
int) – Whentarget_clusteris set, try up ton_trajectories * max_attempts_factorstarting cells to find enough trajectories that reach the target.random_state (
int) – Random seed.
- Return type:
- Returns:
adata with
adata.uns['velot_trajectories'].The stored metadata for each trajectory includes an
"id"field that can be used with
velot.pl.trajectories(trajectory_ids=...)to plot specific trajectories.
Examples
All backward trajectories from Alpha:
velot.tl.compute_trajectories( adata, start_cluster="Alpha", direction="backward", )
Only backward trajectories from Alpha that reach Ngn3 low EP:
velot.tl.compute_trajectories( adata, start_cluster="Alpha", direction="backward", target_cluster="Ngn3 low EP", n_trajectories=5, )
Forward from progenitors that reach Epsilon:
velot.tl.compute_trajectories( adata, start_cluster="Ngn3 low EP", direction="forward", target_cluster="Epsilon", n_trajectories=10, )
Forward from early cells (pseudotime 0 to 0.1) regardless of cluster:
velot.tl.compute_trajectories( adata, start_pseudotime=0.0, end_pseudotime=0.1, direction="forward", )
Backward from Alpha cells with pseudotime > 0.8:
velot.tl.compute_trajectories( adata, start_cluster="Alpha", start_pseudotime=0.8, direction="backward", )
- velot.tl.simulate_flow(adata, n_particles=200, source_cluster=None, source_pseudotime_min=0.0, source_pseudotime_max=0.1, source_x_lim=(-inf, inf), source_y_lim=(-inf, inf), n_steps=300, step_size=0.05, diffusion=0.5, basis='X_pca', cluster_key='clusters', noise_scale=0.0, random_state=42)[source]
Simulate a flow of particles through the learned velocity field.
Drops particles at the “top of the river” (low pseudotime or a source cluster) and integrates them forward through the velocity field with optional stochastic diffusion.
The integration follows a stochastic differential equation:
x_{n+1} = x_n + dt * v(x_n, τ_n) + sqrt(2 * D * dt) * η_n
where D is the diffusion coefficient and η is Gaussian noise. The diffusion allows particles starting from the same region to explore different branches at bifurcation points, producing realistic fate distributions.
- Parameters:
adata (
AnnData) – Must have the smoothing network trained.n_particles (
int) – Number of particles to simulate.source_cluster (
Optional[str]) – Cluster to seed particles from. If None, uses cells with lowest pseudotime.source_pseudotime_max (
float) – If source_cluster is None, seed from cells with pseudotime below this value.n_steps (
int) – Maximum number of integration steps.step_size (
float) – Euler step size as fraction of mean velocity magnitude.diffusion (
float) – Diffusion coefficient controlling stochasticity. 0.0 = deterministic (all trajectories converge). 0.1-0.5 = mild stochasticity (some branching exploration). 1.0+ = strong stochasticity (wide exploration). The noise is scaled relative to the local velocity magnitude so it is adaptive.basis (
str) – PCA embedding key.cluster_key (
str) – Cluster annotation column.noise_scale (
float) – Gaussian noise added to INITIAL positions only. Separate from diffusion which is added at every step.random_state (
int) – Random seed.source_pseudotime_min (float)
source_x_lim (tuple)
source_y_lim (tuple)
- Return type:
adata with
adata.uns['velot_flow'].
Examples
Deterministic flow (trajectories will converge):
velot.tl.simulate_flow(adata, diffusion=0.0, ...)
Stochastic flow (trajectories explore branches):
velot.tl.simulate_flow(adata, diffusion=0.3, ...)
Strong diffusion (wide exploration):
velot.tl.simulate_flow(adata, diffusion=1.0, ...)