Preprocessing (velot.pp)

Prepares an AnnData object for the VelOT pipeline. Follows the scanpy convention: functions modify adata in place and return it for optional chaining.

Typical usage:

import velot

# All-in-one
velot.pp.prepare(adata, n_pcs=30, root_cluster="Root")

# Or step by step
velot.pp.normalize(adata)
velot.pp.select_genes(adata, n_hvg=2000)
velot.pp.pca(adata, n_pcs=30)
velot.pp.neighbors(adata)
velot.pp.umap(adata)
velot.pp.pseudotime(adata, root_cluster="Root")
velot.pp.normalize(adata, target_sum=10000.0)[source]

Total-count normalize and log-transform.

Parameters:
  • adata (AnnData) – Annotated data matrix with raw counts in adata.X.

  • target_sum (float) – Target total counts per cell after normalization.

Return type:

adata, modified in place.

velot.pp.select_genes(adata, n_hvg=2000, flavor='seurat')[source]

Select highly variable genes and subset the data.

Parameters:
  • adata (AnnData) – Annotated data matrix (should be log-normalized).

  • n_hvg (int) – Number of highly variable genes to keep.

  • flavor (str) – HVG selection method passed to scanpy.

Return type:

adata, subsetted to HVGs in place.

velot.pp.scale(adata)[source]

Scale to zero mean and unit variance per gene.

This ensures PCA captures correlation structure rather than being dominated by highly-expressed genes.

Return type:

adata, modified in place.

Parameters:

adata (AnnData)

velot.pp.pca(adata, n_pcs=30)[source]

Compute PCA embedding.

The PCA coordinates (adata.obsm[‘X_pca’]) are the space where velocity will be computed. This is NOT just for visualization.

Return type:

adata with adata.obsm[‘X_pca’] populated.

Parameters:
velot.pp.neighbors(adata, n_pcs=30, n_neighbors=30)[source]

Compute the KNN graph.

The KNN graph is used downstream for:
  • OT cost matrix locality penalties

  • Velocity smoothing (KNN consistency)

  • Pseudotime computation (DPT)

Return type:

adata with adata.obsp[‘connectivities’] and adata.obsp[‘distances’].

Parameters:
velot.pp.umap(adata)[source]

Compute UMAP embedding (for visualization only).

VelOT does NOT compute velocity in UMAP space. UMAP coordinates are used only for plotting.

Return type:

adata with adata.obsm[‘X_umap’] populated.

Parameters:

adata (AnnData)

velot.pp.pseudotime(adata, *, key=None, root_cluster=None, root_cell=None, cluster_key='clusters')[source]

Compute or load pseudotime ordering.

Three modes:
  1. key provided: load precomputed pseudotime from adata.obs[key]

  2. root_cell provided: run DPT from that cell index

  3. root_cluster provided: run DPT from the first cell in that cluster

DPT (Diffusion Pseudotime) computes temporal ordering from the expression geometry alone — no velocity or spliced/unspliced information is used. This keeps the velocity estimation independent.

Parameters:
  • adata (AnnData) – Must already have the KNN graph computed (run velot.pp.neighbors first).

  • key (Optional[str]) – Column name in adata.obs with precomputed pseudotime.

  • root_cluster (Optional[str]) – Cluster name to use as root for DPT.

  • root_cell (Optional[int]) – Cell index to use as root for DPT. Overrides root_cluster.

  • cluster_key (str) – Column in adata.obs with cluster labels.

Return type:

adata with adata.obs[‘pseudotime’] in [0, 1].

velot.pp.prepare(adata, *, n_pcs=30, n_neighbors=30, n_hvg=2000, pseudotime_key=None, root_cluster=None, root_cell=None, cluster_key='clusters', do_normalize=True, copy=True)[source]

Full preprocessing in one call.

Runs: normalize → select_genes → scale → PCA → neighbors → UMAP → pseudotime.

Parameters:
  • adata (AnnData) – Raw or partially processed AnnData object.

  • n_pcs (int) – Number of principal components.

  • n_neighbors (int) – Number of neighbors for KNN graph.

  • n_hvg (Optional[int]) – Number of HVGs to select. None to skip.

  • pseudotime_key (Optional[str]) – Precomputed pseudotime column name. If provided, DPT is skipped.

  • root_cluster (Optional[str]) – Root cluster for DPT.

  • root_cell (Optional[int]) – Root cell index for DPT.

  • cluster_key (str) – Column with cluster labels.

  • do_normalize (bool) – Whether to normalize + log1p. Set False if already done.

  • copy (bool) – Whether to operate on a copy of adata.

Return type:

Preprocessed adata.

Example

import velot
import scvelo as scv

adata = scv.datasets.pancreas()
velot.pp.prepare(adata, root_cluster="Ductal", cluster_key="clusters")