quadsv.detector#

High-level pattern detection for AnnData objects.

class quadsv.detector.PatternDetector(adata: Any, min_cells: int = 1, min_cells_frac: float | None = None)[source]#

Bases: object

Detect spatial or graph-based feature variability using the quadratic-form statistic.

This class implements univariate (Q-test) and bivariate (R-test) spatial statistics for feature screening using kernel-based methods. Supports multiple kernel types (Gaussian, Matérn, Moran’s I, graph Laplacian, CAR) and integrates with scanpy/AnnData.

The core test statistic is: - Univariate: $Q = mathbf{x}^T mathbf{K} mathbf{x}$ for standardized feature vector $mathbf{x}$ and kernel $mathbf{K}$. - Bivariate: $R = mathbf{x}^T mathbf{K} mathbf{y}$ for cross-spatial correlation.

Parameters:

adata (scanpy.AnnData) – Annotated data matrix containing gene expression or other features.
min_cells (int, default 1) – Minimum number of non-zero observations required for a feature to be tested.
min_cells_frac (float, optional) – If provided, overrides min_cells to be this fraction of total cells.

adata#

Reference to the input data object.

Type:: scanpy.AnnData

n#

Number of samples (cells/observations).

Type:: int

kernel_#

Active kernel object (set by build_kernel_* methods).

Type:: Kernel or None

kernel_method_#

Method used to construct the current kernel.

Type:: str or None

kernel_params_#

Parameters used to construct the current kernel.

Type:: dict or None

build_kernel_from_coordinates(coords, method='car', \*\*kernel_params)[source]#: Construct kernel from spatial coordinates.

build_kernel_from_obsp(key, is_distance=False, method='car', \*\*kernel_params)[source]#: Construct kernel from precomputed distance/adjacency matrix.

compute_qstat(source='var', features=None, n_jobs=-1, layer=None, return_pval=True)[source]#: Compute univariate spatial statistics.

compute_rstat(features_x=None, features_y=None, source='var', n_jobs=-1, layer=None, return_pval=True)[source]#: Compute bivariate spatial statistics.

Examples

>>> import anndata
>>> adata = anndata.read_h5ad('data.h5ad')
>>> detector = PatternDetector(adata, min_cells=10)
>>> detector.build_kernel_from_coordinates(adata.obsm['spatial'], method='gaussian', bandwidth=2.0)
>>> q_results = detector.compute_qstat(source='var', n_jobs=4)

__init__(adata: Any, min_cells: int = 1, min_cells_frac: float | None = None) → None[source]#

Initializes the PatternDetector with an AnnData object.

Parameters:

adata (scanpy.AnnData) – Annotated data matrix.
min_cells (int, default 1) – Minimum number of cells with non-zero expression for feature inclusion.
min_cells_frac (float, optional) – If provided, overrides min_cells to be max(1, int(min_cells_frac * n_cells)).

build_kernel_from_coordinates(coords: ndarray, method: str = 'car', **kernel_params: Any) → None[source]#

Builds a spatial kernel from an array of coordinates.

Constructs distance-based kernel matrix from sample coordinates using specified method. Updates self.kernel_ and self.kernel_params_.

Parameters:

coords (np.ndarray) – Coordinate array of shape (n_samples, n_dims). Typically 2D spatial coordinates.
method (str, default 'car') – Kernel construction method. Must be one of: ‘gaussian’, ‘matern’, ‘moran’, ‘graph_laplacian’, ‘car’.
kernel_params (dict, optional) – Additional kernel-specific parameters (e.g., bandwidth, rho, k_neighbors, nu).

Raises:

ValueError – If method not in available kernels or coordinate shape mismatch.

Notes

Common parameters by method: - gaussian: bandwidth (float, default 1.0) - matern: bandwidth (float, default 1.0), nu (float, default 1.5) - moran: k_neighbors (int, default 10) - graph_laplacian: k_neighbors (int, default 10) - car: rho (float in [0, 1), default 0.9), k_neighbors (int, default 10)

Examples

>>> coords = np.random.randn(1000, 2)
>>> detector.build_kernel_from_coordinates(coords, method='gaussian', bandwidth=1.5)

build_kernel_from_obsp(key: str, is_distance: bool = False, method: str = 'car', **kernel_params: Any) → None[source]#

Builds a graph kernel from a precomputed matrix in adata.obsp.

Performs necessary transformations (e.g., distance to Gaussian weights, adjacency to graph Laplacian) before initializing the kernel object. Isolated nodes (zero degree) are automatically removed.

Parameters:

key (str) – Key in adata.obsp containing the precomputed matrix.
is_distance (bool, default False) – If True, matrix is treated as distances. Otherwise treated as adjacency/connectivity.
method (str, default 'car') – Kernel construction method. Must be one of: ‘gaussian’, ‘matern’, ‘moran’, ‘graph_laplacian’, ‘car’, or ‘precomputed’. - ‘precomputed’: Uses obsp[key] directly as kernel matrix K (no transformation). - Other methods: Apply distance/adjacency transformations.
kernel_params (dict, optional) – Additional kernel-specific parameters (e.g., bandwidth, rho, nu).

Raises:

KeyError – If key not found in adata.obsp.
ValueError – If method not recognized or invalid for matrix type (e.g., Gaussian on adjacency).

Notes

For distance matrices (is_distance=True): - gaussian: $K = exp(-d^2 / (2 sigma^2))$ where $sigma$ is bandwidth. - matern: Matérn kernel with nu and bandwidth parameters.

For adjacency matrices (is_distance=False): - Symmetric normalization: $W_{norm} = D^{-1/2} W D^{-1/2}$. - moran: Uses W_norm directly as kernel. - graph_laplacian: $K = I - W_{norm}$. - car: $K = (I - rho W_{norm})^{-1}$ (implicit kernel).

Isolated nodes (degree=0) are automatically removed from the data.

Examples

>>> detector.build_kernel_from_obsp('distances', is_distance=True, method='gaussian', bandwidth=5.0)

compute_qstat(source: str = 'var', features: List[str] | None = None, n_jobs: int = -1, layer: str | None = None, return_pval: bool = True, chunk_size: int = 64) → DataFrame[source]#

Compute univariate spatial Q-statistic for selected features.

Tests each feature for significant spatial clustering or dispersion using the pre-built kernel. Parallelizes across features and applies Benjamini-Hochberg multiple testing correction.

Parameters:

source (str, default 'var') – Feature source: ‘var’ (genes) or ‘obs’ (metadata columns).
features (Optional[List[str]]) – Feature names to test. If None, tests all features in source.
n_jobs (int, default -1) – Number of parallel jobs. -1 uses all available cores; 1 for sequential.
layer (Optional[str]) – If source=’var’, which layer to use (e.g., ‘raw’, ‘log1p’). If None, uses .X.
return_pval (bool, default True) – If True, returns p-values and BH-corrected p-values. If False, returns Q only.
chunk_size (int, default 64) – Number of features to process in each internal batch for vectorization. If n_jobs > 1, each worker processes chunk_size features at a time. May be adjusted for memory/speed tradeoff.

Returns:

df – Results sorted by Q (descending). Columns: - Feature: feature name - Q: test statistic (univariate spatial variability) - Z_score: standardized Q by null mean/std - P_value: tail probability under null (if return_pval=True) - P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)

Return type:

pd.DataFrame

Raises:

ValueError – If kernel not initialized, or source is invalid.

Notes

Under H₀: feature has no spatial structure. Under H₁: significant spatial signal (clustering or dispersion).

Zero-variance features are assigned Q=0, P_value=1.0.

Examples

>>> detector.build_kernel_from_coordinates(adata.obsm['spatial'])
>>> results = detector.compute_qstat(source='var', features=['Gene1', 'Gene2'], n_jobs=-1)
>>> top_genes = results.iloc[:10]

compute_rstat(features_x: List[str] | None = None, features_y: List[str] | None = None, source: str = 'var', n_jobs: int = -1, layer: str | None = None, return_pval: bool = True, chunk_size: int = 64) → DataFrame[source]#

Compute bivariate spatial R-statistic (cross-spatial correlation) for feature pairs.

Tests for significant spatial co-variation between pairs of features using the pre-built kernel. Supports symmetric (all pairs within one set) or bipartite (all X vs Y pairs) modes. Parallelizes computation and applies multiple testing correction.

Parameters:

features_x (Optional[List[str]]) – Feature names for the first set. If None and features_y is None, uses all features (symmetric mode).
features_y (Optional[List[str]]) – Feature names for the second set. If None, computes all pairwise within features_x. If provided, computes all X vs Y pairs (bipartite mode).
source (str, default 'var') – Feature source: ‘var’ (genes) or ‘obs’ (metadata columns).
n_jobs (int, default -1) – Number of parallel jobs. -1 uses all available cores; 1 for sequential.
layer (Optional[str]) – If source=’var’, which layer to use (e.g., ‘raw’, ‘log1p’). If None, uses .X.
return_pval (bool, default True) – If True, returns p-values and BH-corrected p-values. If False, returns R only.
chunk_size (int, default 64) – Number of Y features to process together, pre-computing K@Y_chunk. Larger values use more memory but fewer K matrix multiplications.

Returns:

df – Results sorted by |Z_score| (descending). Columns: - Feature_1: name of first feature - Feature_2: name of second feature - R: test statistic (bivariate spatial correlation, range approximately [-1, 1]) - Z_score: standardized R by null mean/std - P_value: two-tailed p-value under null (if return_pval=True) - P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)

Return type:

pd.DataFrame

Raises:

ValueError – If kernel not initialized, features_x is None when features_y is provided, or no valid pairs generated.

Notes

Under H₀: features are spatially independent. Under H₁: significant spatial co-clustering or co-dispersion.

Different than quadsv.statistics.spatial_r_test, this function always returns R-statistics of all requested feature pairs in the symmetric mode (features_y = None). Thus, for features_x = [A, B, C], the output will contain (A, A), (A, B), (A, C), (B, A), (B, B), (B, C), (C, A), (C, B), (C, C).

P-value calculation uses Normal approximation based on Trace(K²). Zero-variance features are handled gracefully (assigned R=0, P=1).

Examples

>>> detector.build_kernel_from_coordinates(adata.obsm['spatial'])
>>> # All pairwise correlations within gene set
>>> results = detector.compute_rstat(features_x=['Gene1', 'Gene2', 'Gene3'], n_jobs=-1)
>>> # Cross-correlation between two gene sets
>>> results = detector.compute_rstat(
...     features_x=['Gene1', 'Gene2'],
...     features_y=['Gene3', 'Gene4'],
...     n_jobs=-1
... )

Usage example#

import anndata as ad
from quadsv.detector import PatternDetector

# Load spatial transcriptomics data
adata = ad.read_h5ad("spatial_data.h5ad")

# Initialize detector
detector = PatternDetector(adata, min_cells_frac=0.05)

# Build kernel from coordinates
detector.build_kernel_from_coordinates(
    adata.obsm['spatial'],
    method='car',
    k_neighbors=15,
    rho=0.9
)

# Compute Q-statistics (SVG detection)
results = detector.compute_qstat(
    source='var',
    n_jobs=4,
    return_pval=True
)

# Filter significant genes
svg_genes = results[results['P_adj'] < 0.05]
print(f"Found {len(svg_genes)} SVGs")