quadsv.detectors.irregular#

Classes#

DetectorIrregular

Detect spatial patterns on irregular samples (AnnData spots / cells).

Module Contents#

class quadsv.detectors.irregular.DetectorIrregular(kernel_method='matern', backend='matrix', **kernel_params)[source]#

Bases: quadsv.detectors.base.Detector

Detect spatial patterns on irregular samples (AnnData spots / cells).

Univariate (Q-test) and bivariate (R-test) kernel-based spatial statistics. Supports two backends:

  • backend='matrix'MatrixKernel (dense or implicit sparse-precision, auto-selected by n). Good up to ~10⁴ spots.

  • backend='nufft'NUFFTKernel, O(n log n) quadratic forms on arbitrary point sets. Recommended for ≥ 10⁴ spots.

The core test statistics are:

  • Univariate: \(Q = \\mathbf{x}^T \\mathbf{K} \\mathbf{x}\)

  • Bivariate: \(R = \\mathbf{x}^T \\mathbf{K} \\mathbf{y}\)

Workflow#

  1. Construct with kernel method + backend + kernel hyperparameters.

  2. Setup with setup_data() passing the anndata.AnnData plus spatial source (obsm_key in obsm, or obsp_key for precomputed adjacency / distance).

  3. Compute with compute_qstat() / compute_rstat().

param kernel_method:

One of 'gaussian', 'matern', 'moran', 'graph_laplacian', 'car'.

type kernel_method:

str, default 'matern'

param backend:

Kernel backend.

type backend:

{'matrix', 'nufft'}, default 'matrix'

param **kernel_params:

Method- and backend-specific kernel hyperparameters. Matrix backend: bandwidth, nu, rho, k_neighbors, standardize. NUFFT backend: bandwidth, nu, rho, neighbor_degree, plus grid controls grid_shape, spacing, unit_scale, oversample, eps.

ivar backend_:

Which backend was selected at construction.

vartype backend_:

{'matrix', 'nufft'}

ivar adata:

Input container set by setup_data().

vartype adata:

anndata.AnnData or None

ivar min_cells:

Minimum non-zero count per feature; set by setup_data().

vartype min_cells:

int or None

ivar kernel_:

The built kernel; populated by setup_data().

vartype kernel_:

Kernel or None

ivar kernel_method_, kernel_params_, n:

See Detector.

Examples

>>> import anndata as ad, numpy as np
>>> from quadsv import DetectorIrregular
>>> rng = np.random.default_rng(0)
>>> adata = ad.AnnData(X=rng.standard_normal((200, 5)))
>>> adata.obsm["spatial"] = rng.standard_normal((200, 2))
>>> det = DetectorIrregular(kernel_method="car", rho=0.9, k_neighbors=8)
>>> det.setup_data(adata, min_cells=5)  
<DetectorIrregular ...>
>>> # q = det.compute_qstat()
compute_qstat(source='var', features=None, n_jobs=-1, layer=None, return_pval=True, chunk_size='auto', show_progress=True)[source]#

Compute univariate spatial Q-statistic for selected features.

Tests each feature for significant spatial clustering or dispersion using the pre-built kernel. Parallelizes across features and applies Benjamini-Hochberg multiple testing correction.

Parameters:
  • source (str, default 'var') – Feature source: ‘var’ (genes) or ‘obs’ (metadata columns).

  • features (Optional[List[str]]) – Feature names to test. If None, tests all features in source.

  • n_jobs (int, default -1) – Number of parallel jobs. -1 uses all available cores; 1 for sequential.

  • layer (Optional[str]) – If source=’var’, which layer to use (e.g., ‘raw’, ‘log1p’). If None, uses .X.

  • return_pval (bool, default True) – If True, returns p-values and BH-corrected p-values. If False, returns Q only.

  • chunk_size (int or 'auto', default 'auto') – Number of features each worker densifies at once (inner batch). 'auto' targets ~256 MB per batch using _auto_chunk_size(), yielding chunk_size clip(16, 512, 256 MB / (4 · n · 8 B)). Override with an integer when memory is tight or you want deterministic batching.

  • show_progress (bool, default True) – Show a tqdm progress bar over worker chunks.

Returns:

df – Results sorted by Q (descending). Columns: - Feature: feature name - Q: test statistic (univariate spatial variability) - Z_score: standardized Q by null mean/std - P_value: tail probability under null (if return_pval=True) - P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)

Return type:

pd.DataFrame

Raises:

ValueError – If kernel not initialized, or source is invalid.

Notes

Under H₀: feature has no spatial structure. Under H₁: significant spatial signal (clustering or dispersion).

Zero-variance features are assigned Q=0, P_value=1.0.

The null-distribution approximation is auto-selected from self.kernel_method_ ('clt' for Moran’s I, 'welch' for all other kernels) and cannot be overridden through this method. For full control over the null method (including 'liu'), call quadsv.statistics.spatial_q_test() directly.

Examples

>>> detector.setup_data(adata)
>>> results = detector.compute_qstat(source='var', features=['Gene1', 'Gene2'], n_jobs=-1)
>>> top_genes = results.iloc[:10]
compute_rstat(features_x=None, features_y=None, source='var', n_jobs=-1, layer=None, return_pval=True, chunk_size='auto', show_progress=True)[source]#

Compute bivariate spatial R-statistic (cross-spatial correlation) for feature pairs.

Tests for significant spatial co-variation between pairs of features using the pre-built kernel. Supports symmetric (all pairs within one set) or bipartite (all X vs Y pairs) modes. Parallelizes computation and applies multiple testing correction.

Parameters:
  • features_x (Optional[List[str]]) – Feature names for the first set. If None and features_y is None, uses all features (symmetric mode).

  • features_y (Optional[List[str]]) – Feature names for the second set. If None, computes all pairwise within features_x. If provided, computes all X vs Y pairs (bipartite mode).

  • source (str, default 'var') – Feature source: ‘var’ (genes) or ‘obs’ (metadata columns).

  • n_jobs (int, default -1) – Number of parallel jobs. -1 uses all available cores; 1 for sequential.

  • layer (Optional[str]) – If source=’var’, which layer to use (e.g., ‘raw’, ‘log1p’). If None, uses .X.

  • return_pval (bool, default True) – If True, returns p-values and BH-corrected p-values. If False, returns R only.

  • chunk_size (int or 'auto', default 'auto') – Number of Y features to batch together when pre-computing K @ Y_chunk. 'auto' uses _auto_chunk_size() (~256 MB per batch target); integer values override the heuristic.

  • show_progress (bool, default True) – Show a tqdm progress bar over the Y-chunk loop.

Returns:

df – Results sorted by absolute Z_score (descending). Columns:

  • Feature_1: name of first feature

  • Feature_2: name of second feature

  • R: test statistic (bivariate spatial correlation, range approximately [-1, 1])

  • Z_score: standardized R by null mean/std

  • P_value: two-tailed p-value under null (if return_pval=True)

  • P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)

Return type:

pd.DataFrame

Raises:

ValueError – If kernel not initialized, features_x is None when features_y is provided, or no valid pairs generated.

Notes

Under H₀: features are spatially independent. Under H₁: significant spatial co-clustering or co-dispersion.

Unlike quadsv.statistics.spatial_r_test(), this method always returns R-statistics for all requested feature pairs in the symmetric mode (features_y=None). For features_x=[A, B, C], the output contains (A, A), (A, B), (A, C), (B, A), (B, B), (B, C), (C, A), (C, B), (C, C).

P-value calculation uses a normal approximation based on Tr(K²) and is not configurable through this method. For finer control over the null model, call quadsv.statistics.spatial_r_test() directly.

Zero-variance features are handled gracefully (assigned R=0, P=1).

Examples

>>> detector.setup_data(adata)
>>> # All pairwise correlations within gene set
>>> results = detector.compute_rstat(features_x=['Gene1', 'Gene2', 'Gene3'], n_jobs=-1)
>>> # Cross-correlation between two gene sets
>>> results = detector.compute_rstat(
...     features_x=['Gene1', 'Gene2'],
...     features_y=['Gene3', 'Gene4'],
...     n_jobs=-1
... )
setup_data(adata, *, obsm_key='spatial', obsp_key=None, is_distance=False, min_cells=1, min_cells_frac=None)[source]#

Attach adata, apply feature filters, build the kernel.

Parameters:
  • adata (anndata.AnnData) – Input container. Must have adata.obsm[obsm_key] (unless obsp_key is provided instead).

  • obsm_key (str, default 'spatial') – Key in adata.obsm holding (n_obs, 2) spatial coordinates. Used when obsp_key is None.

  • obsp_key (str, optional) – If provided, build the kernel from adata.obsp[obsp_key] instead of from coordinates. Not compatible with backend='nufft'.

  • is_distance (bool, default False) – When obsp_key is given: treat the matrix as pairwise distances (True) or adjacency / connectivity (False).

  • min_cells (int, default 1) – Minimum number of cells with non-zero value for a feature to be tested. Clamped to [1, n_obs].

  • min_cells_frac (float, optional) – If provided, overrides min_cells with max(1, int(min_cells_frac * n_obs)).

Returns:

self

Return type:

DetectorIrregular

adata: Any | None = None[source]#

Reference to the input anndata.AnnData, set by setup_data().

backend_: str = 'matrix'[source]#

Which backend will build the kernel — 'matrix' or 'nufft'.

min_cells: int | None = None[source]#

Minimum non-zero-count threshold applied in setup_data().

Parameters:
  • kernel_method (str)

  • backend (str)

  • kernel_params (Any)