quadsv.detector#
High-level pattern detection for AnnData objects.
- class quadsv.detector.PatternDetector(adata: Any, min_cells: int = 1, min_cells_frac: float | None = None)[source]#
Bases:
objectDetect spatial or graph-based feature variability using the quadratic-form statistic.
This class implements univariate (Q-test) and bivariate (R-test) spatial statistics for feature screening using kernel-based methods. Supports multiple kernel types (Gaussian, Matérn, Moran’s I, graph Laplacian, CAR) and integrates with scanpy/AnnData.
The core test statistic is: - Univariate: $Q = mathbf{x}^T mathbf{K} mathbf{x}$ for standardized feature vector $mathbf{x}$ and kernel $mathbf{K}$. - Bivariate: $R = mathbf{x}^T mathbf{K} mathbf{y}$ for cross-spatial correlation.
- Parameters:
adata (scanpy.AnnData) – Annotated data matrix containing gene expression or other features.
min_cells (int, default 1) – Minimum number of non-zero observations required for a feature to be tested.
min_cells_frac (float, optional) – If provided, overrides min_cells to be this fraction of total cells.
- adata#
Reference to the input data object.
- Type:
scanpy.AnnData
- build_kernel_from_coordinates(coords, method='car', \*\*kernel_params)[source]#
Construct kernel from spatial coordinates.
- build_kernel_from_obsp(key, is_distance=False, method='car', \*\*kernel_params)[source]#
Construct kernel from precomputed distance/adjacency matrix.
- compute_qstat(source='var', features=None, n_jobs=-1, layer=None, return_pval=True)[source]#
Compute univariate spatial statistics.
- compute_rstat(features_x=None, features_y=None, source='var', n_jobs=-1, layer=None, return_pval=True)[source]#
Compute bivariate spatial statistics.
Examples
>>> import anndata >>> adata = anndata.read_h5ad('data.h5ad') >>> detector = PatternDetector(adata, min_cells=10) >>> detector.build_kernel_from_coordinates(adata.obsm['spatial'], method='gaussian', bandwidth=2.0) >>> q_results = detector.compute_qstat(source='var', n_jobs=4)
- __init__(adata: Any, min_cells: int = 1, min_cells_frac: float | None = None) None[source]#
Initializes the PatternDetector with an AnnData object.
- build_kernel_from_coordinates(coords: ndarray, method: str = 'car', **kernel_params: Any) None[source]#
Builds a spatial kernel from an array of coordinates.
Constructs distance-based kernel matrix from sample coordinates using specified method. Updates self.kernel_ and self.kernel_params_.
- Parameters:
coords (np.ndarray) – Coordinate array of shape (n_samples, n_dims). Typically 2D spatial coordinates.
method (str, default 'car') – Kernel construction method. Must be one of: ‘gaussian’, ‘matern’, ‘moran’, ‘graph_laplacian’, ‘car’.
kernel_params (dict, optional) – Additional kernel-specific parameters (e.g., bandwidth, rho, k_neighbors, nu).
- Raises:
ValueError – If method not in available kernels or coordinate shape mismatch.
Notes
Common parameters by method: - gaussian: bandwidth (float, default 1.0) - matern: bandwidth (float, default 1.0), nu (float, default 1.5) - moran: k_neighbors (int, default 10) - graph_laplacian: k_neighbors (int, default 10) - car: rho (float in [0, 1), default 0.9), k_neighbors (int, default 10)
Examples
>>> coords = np.random.randn(1000, 2) >>> detector.build_kernel_from_coordinates(coords, method='gaussian', bandwidth=1.5)
- build_kernel_from_obsp(key: str, is_distance: bool = False, method: str = 'car', **kernel_params: Any) None[source]#
Builds a graph kernel from a precomputed matrix in adata.obsp.
Performs necessary transformations (e.g., distance to Gaussian weights, adjacency to graph Laplacian) before initializing the kernel object. Isolated nodes (zero degree) are automatically removed.
- Parameters:
key (str) – Key in adata.obsp containing the precomputed matrix.
is_distance (bool, default False) – If True, matrix is treated as distances. Otherwise treated as adjacency/connectivity.
method (str, default 'car') – Kernel construction method. Must be one of: ‘gaussian’, ‘matern’, ‘moran’, ‘graph_laplacian’, ‘car’, or ‘precomputed’. - ‘precomputed’: Uses obsp[key] directly as kernel matrix K (no transformation). - Other methods: Apply distance/adjacency transformations.
kernel_params (dict, optional) – Additional kernel-specific parameters (e.g., bandwidth, rho, nu).
- Raises:
KeyError – If key not found in adata.obsp.
ValueError – If method not recognized or invalid for matrix type (e.g., Gaussian on adjacency).
Notes
For distance matrices (is_distance=True): - gaussian: $K = exp(-d^2 / (2 sigma^2))$ where $sigma$ is bandwidth. - matern: Matérn kernel with nu and bandwidth parameters.
For adjacency matrices (is_distance=False): - Symmetric normalization: $W_{norm} = D^{-1/2} W D^{-1/2}$. - moran: Uses W_norm directly as kernel. - graph_laplacian: $K = I - W_{norm}$. - car: $K = (I - rho W_{norm})^{-1}$ (implicit kernel).
Isolated nodes (degree=0) are automatically removed from the data.
Examples
>>> detector.build_kernel_from_obsp('distances', is_distance=True, method='gaussian', bandwidth=5.0)
- compute_qstat(source: str = 'var', features: List[str] | None = None, n_jobs: int = -1, layer: str | None = None, return_pval: bool = True, chunk_size: int = 64) DataFrame[source]#
Compute univariate spatial Q-statistic for selected features.
Tests each feature for significant spatial clustering or dispersion using the pre-built kernel. Parallelizes across features and applies Benjamini-Hochberg multiple testing correction.
- Parameters:
source (str, default 'var') – Feature source: ‘var’ (genes) or ‘obs’ (metadata columns).
features (Optional[List[str]]) – Feature names to test. If None, tests all features in source.
n_jobs (int, default -1) – Number of parallel jobs. -1 uses all available cores; 1 for sequential.
layer (Optional[str]) – If source=’var’, which layer to use (e.g., ‘raw’, ‘log1p’). If None, uses .X.
return_pval (bool, default True) – If True, returns p-values and BH-corrected p-values. If False, returns Q only.
chunk_size (int, default 64) – Number of features to process in each internal batch for vectorization. If n_jobs > 1, each worker processes chunk_size features at a time. May be adjusted for memory/speed tradeoff.
- Returns:
df – Results sorted by Q (descending). Columns: - Feature: feature name - Q: test statistic (univariate spatial variability) - Z_score: standardized Q by null mean/std - P_value: tail probability under null (if return_pval=True) - P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)
- Return type:
pd.DataFrame
- Raises:
ValueError – If kernel not initialized, or source is invalid.
Notes
Under H₀: feature has no spatial structure. Under H₁: significant spatial signal (clustering or dispersion).
Zero-variance features are assigned Q=0, P_value=1.0.
Examples
>>> detector.build_kernel_from_coordinates(adata.obsm['spatial']) >>> results = detector.compute_qstat(source='var', features=['Gene1', 'Gene2'], n_jobs=-1) >>> top_genes = results.iloc[:10]
- compute_rstat(features_x: List[str] | None = None, features_y: List[str] | None = None, source: str = 'var', n_jobs: int = -1, layer: str | None = None, return_pval: bool = True, chunk_size: int = 64) DataFrame[source]#
Compute bivariate spatial R-statistic (cross-spatial correlation) for feature pairs.
Tests for significant spatial co-variation between pairs of features using the pre-built kernel. Supports symmetric (all pairs within one set) or bipartite (all X vs Y pairs) modes. Parallelizes computation and applies multiple testing correction.
- Parameters:
features_x (Optional[List[str]]) – Feature names for the first set. If None and features_y is None, uses all features (symmetric mode).
features_y (Optional[List[str]]) – Feature names for the second set. If None, computes all pairwise within features_x. If provided, computes all X vs Y pairs (bipartite mode).
source (str, default 'var') – Feature source: ‘var’ (genes) or ‘obs’ (metadata columns).
n_jobs (int, default -1) – Number of parallel jobs. -1 uses all available cores; 1 for sequential.
layer (Optional[str]) – If source=’var’, which layer to use (e.g., ‘raw’, ‘log1p’). If None, uses .X.
return_pval (bool, default True) – If True, returns p-values and BH-corrected p-values. If False, returns R only.
chunk_size (int, default 64) – Number of Y features to process together, pre-computing K@Y_chunk. Larger values use more memory but fewer K matrix multiplications.
- Returns:
df – Results sorted by |Z_score| (descending). Columns: - Feature_1: name of first feature - Feature_2: name of second feature - R: test statistic (bivariate spatial correlation, range approximately [-1, 1]) - Z_score: standardized R by null mean/std - P_value: two-tailed p-value under null (if return_pval=True) - P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)
- Return type:
pd.DataFrame
- Raises:
ValueError – If kernel not initialized, features_x is None when features_y is provided, or no valid pairs generated.
Notes
Under H₀: features are spatially independent. Under H₁: significant spatial co-clustering or co-dispersion.
Different than quadsv.statistics.spatial_r_test, this function always returns R-statistics of all requested feature pairs in the symmetric mode (features_y = None). Thus, for features_x = [A, B, C], the output will contain (A, A), (A, B), (A, C), (B, A), (B, B), (B, C), (C, A), (C, B), (C, C).
P-value calculation uses Normal approximation based on Trace(K²). Zero-variance features are handled gracefully (assigned R=0, P=1).
Examples
>>> detector.build_kernel_from_coordinates(adata.obsm['spatial']) >>> # All pairwise correlations within gene set >>> results = detector.compute_rstat(features_x=['Gene1', 'Gene2', 'Gene3'], n_jobs=-1) >>> # Cross-correlation between two gene sets >>> results = detector.compute_rstat( ... features_x=['Gene1', 'Gene2'], ... features_y=['Gene3', 'Gene4'], ... n_jobs=-1 ... )
Usage example#
import anndata as ad
from quadsv.detector import PatternDetector
# Load spatial transcriptomics data
adata = ad.read_h5ad("spatial_data.h5ad")
# Initialize detector
detector = PatternDetector(adata, min_cells_frac=0.05)
# Build kernel from coordinates
detector.build_kernel_from_coordinates(
adata.obsm['spatial'],
method='car',
k_neighbors=15,
rho=0.9
)
# Compute Q-statistics (SVG detection)
results = detector.compute_qstat(
source='var',
n_jobs=4,
return_pval=True
)
# Filter significant genes
svg_genes = results[results['P_adj'] < 0.05]
print(f"Found {len(svg_genes)} SVGs")