quadsv.detectors.irregular

quadsv.detectors.irregular#

Classes#

DetectorIrregular

Detect spatial patterns on irregular samples (AnnData spots / cells).

Module Contents#

class quadsv.detectors.irregular.DetectorIrregular(kernel_method='matern', backend='matrix', **kernel_params)[source]#

Bases: quadsv.detectors.base.Detector

Detect spatial patterns on irregular samples (AnnData spots / cells).

Univariate (Q-test) and bivariate (R-test) kernel-based spatial statistics. Supports two backends:

backend='matrix' — MatrixKernel (dense or implicit sparse-precision, auto-selected by n). Good up to ~10⁴ spots.
backend='nufft' — NUFFTKernel, O(n log n) quadratic forms on arbitrary point sets. Recommended for ≥ 10⁴ spots.

The core test statistics are:

Univariate: \(Q = \\mathbf{x}^T \\mathbf{K} \\mathbf{x}\)
Bivariate: \(R = \\mathbf{x}^T \\mathbf{K} \\mathbf{y}\)

Workflow#

Construct with kernel method + backend + kernel hyperparameters.
Setup with setup_data() passing the anndata.AnnData plus spatial source (obsm_key in obsm, or obsp_key for precomputed adjacency / distance).
Compute with compute_qstat() / compute_rstat().

param kernel_method:: One of 'gaussian', 'matern', 'moran', 'graph_laplacian', 'car'.
type kernel_method:: str, default 'matern'
param backend:: Kernel backend.
type backend:: {'matrix', 'nufft'}, default 'matrix'
param **kernel_params:: Method- and backend-specific kernel hyperparameters. Matrix backend: bandwidth, nu, rho, k_neighbors, standardize. NUFFT backend: bandwidth, nu, rho, neighbor_degree, plus grid controls grid_shape, spacing, unit_scale, oversample, eps.
ivar backend_:: Which backend was selected at construction.
vartype backend_:: {'matrix', 'nufft'}
ivar adata:: Input container set by setup_data().
vartype adata:: anndata.AnnData or None
ivar min_cells:: Minimum non-zero count per feature; set by setup_data().
vartype min_cells:: int or None
ivar kernel_:: The built kernel; populated by setup_data().
vartype kernel_:: Kernel or None
ivar kernel_method_, kernel_params_, n:: See Detector.

Examples

>>> import anndata as ad, numpy as np
>>> from quadsv import DetectorIrregular
>>> rng = np.random.default_rng(0)
>>> adata = ad.AnnData(X=rng.standard_normal((200, 5)))
>>> adata.obsm["spatial"] = rng.standard_normal((200, 2))
>>> det = DetectorIrregular(kernel_method="car", rho=0.9, k_neighbors=8)
>>> det.setup_data(adata, min_cells=5)  
<DetectorIrregular ...>
>>> # q = det.compute_qstat()

compute_qstat(source='var', features=None, n_jobs=-1, layer=None, return_pval=True, chunk_size='auto', show_progress=True)[source]#

Compute univariate spatial Q-statistic for selected features.

Tests each feature for significant spatial clustering or dispersion using the pre-built kernel. Parallelizes across features and applies Benjamini-Hochberg multiple testing correction.

Parameters:

source (str, default 'var') – Feature source: ‘var’ (genes) or ‘obs’ (metadata columns).
features (Optional[List[str]]) – Feature names to test. If None, tests all features in source.
n_jobs (int, default -1) – Number of parallel jobs. -1 uses all available cores; 1 for sequential.
layer (Optional[str]) – If source=’var’, which layer to use (e.g., ‘raw’, ‘log1p’). If None, uses .X.
return_pval (bool, default True) – If True, returns p-values and BH-corrected p-values. If False, returns Q only.
chunk_size (int or 'auto', default 'auto') – Number of features each worker densifies at once (inner batch). 'auto' targets ~256 MB per batch using _auto_chunk_size(), yielding chunk_size ≈ clip(16, 512, 256 MB / (4 · n · 8 B)). Override with an integer when memory is tight or you want deterministic batching.
show_progress (bool, default True) – Show a tqdm progress bar over worker chunks.

Returns:

df – Results sorted by Q (descending). Columns: - Feature: feature name - Q: test statistic (univariate spatial variability) - Z_score: standardized Q by null mean/std - P_value: tail probability under null (if return_pval=True) - P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)

Return type:

pd.DataFrame

Raises:

ValueError – If kernel not initialized, or source is invalid.

Notes

Under H₀: feature has no spatial structure. Under H₁: significant spatial signal (clustering or dispersion).

Zero-variance features are assigned Q=0, P_value=1.0.

The null-distribution approximation is auto-selected from self.kernel_method_ ('clt' for Moran’s I, 'welch' for all other kernels) and cannot be overridden through this method. For full control over the null method (including 'liu'), call quadsv.statistics.spatial_q_test() directly.

Examples

>>> detector.setup_data(adata)
>>> results = detector.compute_qstat(source='var', features=['Gene1', 'Gene2'], n_jobs=-1)
>>> top_genes = results.iloc[:10]

compute_rstat(features_x=None, features_y=None, source='var', n_jobs=-1, layer=None, return_pval=True, chunk_size='auto', show_progress=True)[source]#

Compute bivariate spatial R-statistic (cross-spatial correlation) for feature pairs.

Tests for significant spatial co-variation between pairs of features using the pre-built kernel. Supports symmetric (all pairs within one set) or bipartite (all X vs Y pairs) modes. Parallelizes computation and applies multiple testing correction.

Parameters:

features_x (Optional[List[str]]) – Feature names for the first set. If None and features_y is None, uses all features (symmetric mode).
features_y (Optional[List[str]]) – Feature names for the second set. If None, computes all pairwise within features_x. If provided, computes all X vs Y pairs (bipartite mode).
source (str, default 'var') – Feature source: ‘var’ (genes) or ‘obs’ (metadata columns).
n_jobs (int, default -1) – Number of parallel jobs. -1 uses all available cores; 1 for sequential.
layer (Optional[str]) – If source=’var’, which layer to use (e.g., ‘raw’, ‘log1p’). If None, uses .X.
return_pval (bool, default True) – If True, returns p-values and BH-corrected p-values. If False, returns R only.
chunk_size (int or 'auto', default 'auto') – Number of Y features to batch together when pre-computing K @ Y_chunk. 'auto' uses _auto_chunk_size() (~256 MB per batch target); integer values override the heuristic.
show_progress (bool, default True) – Show a tqdm progress bar over the Y-chunk loop.

Returns:

df – Results sorted by absolute Z_score (descending). Columns:

Feature_1: name of first feature
Feature_2: name of second feature
R: test statistic (bivariate spatial correlation, range approximately [-1, 1])
Z_score: standardized R by null mean/std
P_value: two-tailed p-value under null (if return_pval=True)
P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)

Return type:

pd.DataFrame

Raises:

ValueError – If kernel not initialized, features_x is None when features_y is provided, or no valid pairs generated.

Notes

Under H₀: features are spatially independent. Under H₁: significant spatial co-clustering or co-dispersion.

Unlike quadsv.statistics.spatial_r_test(), this method always returns R-statistics for all requested feature pairs in the symmetric mode (features_y=None). For features_x=[A, B, C], the output contains (A, A), (A, B), (A, C), (B, A), (B, B), (B, C), (C, A), (C, B), (C, C).

P-value calculation uses a normal approximation based on Tr(K²) and is not configurable through this method. For finer control over the null model, call quadsv.statistics.spatial_r_test() directly.

Zero-variance features are handled gracefully (assigned R=0, P=1).

Examples

>>> detector.setup_data(adata)
>>> # All pairwise correlations within gene set
>>> results = detector.compute_rstat(features_x=['Gene1', 'Gene2', 'Gene3'], n_jobs=-1)
>>> # Cross-correlation between two gene sets
>>> results = detector.compute_rstat(
...     features_x=['Gene1', 'Gene2'],
...     features_y=['Gene3', 'Gene4'],
...     n_jobs=-1
... )

setup_data(adata, *, obsm_key='spatial', obsp_key=None, is_distance=False, min_cells=1, min_cells_frac=None)[source]#

Attach adata, apply feature filters, build the kernel.

Parameters:

adata (anndata.AnnData) – Input container. Must have adata.obsm[obsm_key] (unless obsp_key is provided instead).
obsm_key (str, default 'spatial') – Key in adata.obsm holding (n_obs, 2) spatial coordinates. Used when obsp_key is None.
obsp_key (str, optional) – If provided, build the kernel from adata.obsp[obsp_key] instead of from coordinates. Not compatible with backend='nufft'.
is_distance (bool, default False) – When obsp_key is given: treat the matrix as pairwise distances (True) or adjacency / connectivity (False).
min_cells (int, default 1) – Minimum number of cells with non-zero value for a feature to be tested. Clamped to [1, n_obs].
min_cells_frac (float, optional) – If provided, overrides min_cells with max(1, int(min_cells_frac * n_obs)).

Returns:

self

Return type:

DetectorIrregular

adata: Any | None = None[source]#: Reference to the input anndata.AnnData, set by setup_data().

backend_: str = 'matrix'[source]#: Which backend will build the kernel — 'matrix' or 'nufft'.

min_cells: int | None = None[source]#: Minimum non-zero-count threshold applied in setup_data().

Parameters:

kernel_method (str)
backend (str)
kernel_params (Any)