Preprocessing (velot.pp)
Prepares an AnnData object for the VelOT pipeline. Follows the scanpy convention: functions modify adata in place and return it for optional chaining.
Typical usage:
import velot
# All-in-one
velot.pp.prepare(adata, n_pcs=30, root_cluster="Root")
# Or step by step
velot.pp.normalize(adata)
velot.pp.select_genes(adata, n_hvg=2000)
velot.pp.pca(adata, n_pcs=30)
velot.pp.neighbors(adata)
velot.pp.umap(adata)
velot.pp.pseudotime(adata, root_cluster="Root")
- velot.pp.select_genes(adata, n_hvg=2000, flavor='seurat')[source]
Select highly variable genes and subset the data.
- velot.pp.scale(adata)[source]
Scale to zero mean and unit variance per gene.
This ensures PCA captures correlation structure rather than being dominated by highly-expressed genes.
- Return type:
adata, modified in place.
- Parameters:
adata (AnnData)
- velot.pp.pca(adata, n_pcs=30)[source]
Compute PCA embedding.
The PCA coordinates (adata.obsm[‘X_pca’]) are the space where velocity will be computed. This is NOT just for visualization.
- velot.pp.neighbors(adata, n_pcs=30, n_neighbors=30)[source]
Compute the KNN graph.
- The KNN graph is used downstream for:
OT cost matrix locality penalties
Velocity smoothing (KNN consistency)
Pseudotime computation (DPT)
- velot.pp.umap(adata)[source]
Compute UMAP embedding (for visualization only).
VelOT does NOT compute velocity in UMAP space. UMAP coordinates are used only for plotting.
- Return type:
adata with adata.obsm[‘X_umap’] populated.
- Parameters:
adata (AnnData)
- velot.pp.pseudotime(adata, *, key=None, root_cluster=None, root_cell=None, cluster_key='clusters')[source]
Compute or load pseudotime ordering.
- Three modes:
keyprovided: load precomputed pseudotime from adata.obs[key]root_cellprovided: run DPT from that cell indexroot_clusterprovided: run DPT from the first cell in that cluster
DPT (Diffusion Pseudotime) computes temporal ordering from the expression geometry alone — no velocity or spliced/unspliced information is used. This keeps the velocity estimation independent.
- Parameters:
adata (
AnnData) – Must already have the KNN graph computed (runvelot.pp.neighborsfirst).key (
Optional[str]) – Column name in adata.obs with precomputed pseudotime.root_cluster (
Optional[str]) – Cluster name to use as root for DPT.root_cell (
Optional[int]) – Cell index to use as root for DPT. Overrides root_cluster.cluster_key (
str) – Column in adata.obs with cluster labels.
- Return type:
adata with adata.obs[‘pseudotime’] in [0, 1].
- velot.pp.prepare(adata, *, n_pcs=30, n_neighbors=30, n_hvg=2000, pseudotime_key=None, root_cluster=None, root_cell=None, cluster_key='clusters', do_normalize=True, copy=True)[source]
Full preprocessing in one call.
Runs: normalize → select_genes → scale → PCA → neighbors → UMAP → pseudotime.
- Parameters:
adata (
AnnData) – Raw or partially processed AnnData object.n_pcs (
int) – Number of principal components.n_neighbors (
int) – Number of neighbors for KNN graph.n_hvg (
Optional[int]) – Number of HVGs to select. None to skip.pseudotime_key (
Optional[str]) – Precomputed pseudotime column name. If provided, DPT is skipped.cluster_key (
str) – Column with cluster labels.do_normalize (
bool) – Whether to normalize + log1p. Set False if already done.copy (
bool) – Whether to operate on a copy of adata.
- Return type:
Preprocessed adata.
Example
import velot import scvelo as scv adata = scv.datasets.pancreas() velot.pp.prepare(adata, root_cluster="Ductal", cluster_key="clusters")