quadsv.statistics

quadsv.statistics#

Functions#

`auto_chunk_size`(kernel[, n_jobs, budget_bytes])	Pick a per-backend-optimal `chunk_size` for the Q / R test.
`compute_null_params`(kernel[, method, k_eigen, ...])	Pre-compute null distribution parameters for spatial tests.
`effective_rank`(cov[, weights])	Effective rank (participation ratio) of a covariance matrix.
`gene_pattern_diversity`(spectra[, weights, eps])	Spatial-pattern diversity across genes within a single sample.
`liu_sf`(t, lambs[, dofs, deltas, kurtosis, n])	Liu approximation to a linear combination of non-central chi-squared variables.
`spatial_q_test`(Xn, kernel[, null_params, return_pval, ...])	Univariate spatial Q-test for detecting spatial variability.
`spatial_r_test`(Xn, Yn, kernel[, null_params, ...])	Bivariate spatial R-test for correlation between two spatial variables.
`within_group_pattern_diversity`(spectra, groups[, ...])	Spatial-pattern diversity of the within-group residual covariance.

Module Contents#

quadsv.statistics.auto_chunk_size(kernel, n_jobs=1, budget_bytes=_DEFAULT_CHUNK_BUDGET)[source]#

Pick a per-backend-optimal chunk_size for the Q / R test.

The returned value is used by spatial_q_test() / spatial_r_test() (and by DetectorGrid.compute_qstat() / DetectorIrregular.compute_qstat()) to split a multi-feature batch into chunks. It is the smaller of two caps:

Cache sweet-spot cap — empirical sweep of per-feature time vs chunk at n ∈ {30k, 100k, 300k, 1M}:

Backend

chunk cap

FFTKernel

32

NUFFTKernel

64

MatrixKernel (any sub-type)

16 (n < 200k); 8 (n ≥ 200k)

Matrix backends don’t vectorise over RHS columns (scipy CSR SpMV, SuperLU triangular solve), and the chunk size cap is determined empirically for best per-feature speed under the given memory constraints. FFT / NUFFT do benefit from BLAS / n_transf batching, but their complex workspace spills L3 past the listed cap (15× slowdown for FFT at chunk=512, 1.9× for NUFFT at chunk=256).
Memory cap — budget_bytes / n_jobs // per_feat, where per_feat is the backend-specific transient bytes per feature:
- MatrixKernel dense / sparse: 16 · n
- MatrixKernel precision-stored CAR: 24 · n
- FFTKernel: 24 · n
- NUFFTKernel: 16 · ny·nx + 8 · n

Parameters:

kernel (Kernel) – The backend kernel the chunk will operate on.
n_jobs (int, default 1) – Number of parallel workers the caller plans to use. The budget_bytes is divided by n_jobs so aggregate live memory stays bounded.
budget_bytes (int, default 2 GiB) – Aggregate live-memory cap across all workers.

Returns:

A chunk_size in [8, chunk_cap]. The lower bound of 8 ensures measurements stay meaningful even when a single feature consumes most of the per-worker budget.

Return type:

int

quadsv.statistics.compute_null_params(kernel, method='welch', k_eigen=None, dirichlet_correction=True, liu_n_probes=None)[source]#

Pre-compute null distribution parameters for spatial tests.

Call this ONCE before running parallel tests on thousands of features. Caches the expensive computations (traces, cumulants, shifted-χ² fit) for reuse across both Q-tests and R-tests.

Parameters:

kernel (Kernel) – The spatial kernel object (MatrixKernel, FFTKernel, NUFFTKernel, or compatible).
method ({'clt', 'welch', 'liu'}, default 'welch') –
Null approximation method for the Q-test. The R-test entry var_R = trace(K̃²) is always populated alongside, regardless of method — R-tests use a Normal approximation and only need this one moment.
- ’clt’: Central Limit Theorem (Z-score normal approximation)
- ’welch’: Welch-Satterthwaite moment matching (fast, uses traces)
- ’liu’: Liu eigenvalue-based approximation (accurate tail, slower)
k_eigen (int, optional) – Number of top eigenvalues to compute if method=’liu’ and kernel is sparse. If None, computes all available eigenvalues. Ignored when liu_n_probes is set (eigenvalues are bypassed entirely).
liu_n_probes (int, optional) – If set, bypass the eigendecomposition for method='liu' and estimate the four spectral cumulants c_p = trace(K̃^p), p = 1..4, directly from the kernel via Hutchinson probing (_hutchinson_cumulants()). Cost drops from \(O(n^3)\) (dense eigensolve) or \(O(r^3)\) (reduced Toeplitz-M) to \(2 \cdot n_\mathrm{probes}\) kernel matvecs, at the cost of \(O(n_\mathrm{probes}^{-1/2})\) Monte-Carlo error in each cumulant. Rule of thumb: n_probes = 60 gives Liu p-values within ~5\% of the eigenvalue-exact baseline; n_probes = 120 within ~0.2\%. When None (default), the eigenvalue path is used.
dirichlet_correction (bool, default True) – When True: use the finite-n Dirichlet(1/2) ratio Var[Q] = 2 · (m · trace((HKH)²) - trace(HKH)²) / (m+2) with m = n-1. When False: drop the mean² term to the large-n limit Var[Q] = 2 · trace((HKH)²), a monotonic upper bound that slightly overestimates Var[Q] at finite n. See Notes for the derivation.

Returns:

dict[str, float or np.ndarray] – Always populated (regardless of method):
- 'method' : str — the Q-test approximation selected.
- 'var_R' : float — trace(K̃²), the null variance of R (used by spatial_r_test()).
Method-specific additions:
- method='liu' (default for FFT / NUFFT kernels):
  - 'cumulants' : dict {1: c_1, 2: c_2, 3: c_3, 4: c_4} with c_p = trace(K̃^p). Computed from the full eigendecomposition when available (liu_n_probes is None) or from \(2m\) Hutchinson probes otherwise.
  - 'liu_coef' : dict with cached Liu coefficients {'mu_Q', 'sigma_Q', 'mu_x', 'sigma_x', 'dof_x', 'delta_x'} derived from cumulants once; consumed by spatial_q_test() so per-feature p-values reduce to a pure \(t\)-broadcast.
- method='welch' (default for MatrixKernel Q-tests):
  - 'mean_Q' : trace(K)
  - 'var_Q' : 2 · trace(K²)
  - 'scale_g' : Welch scale parameter var_Q / (2 · mean_Q)
  - 'df_h' : Welch df 2 · mean_Q² / var_Q
- method='clt': 'mean_Q', 'var_Q' only.
Consumers (spatial_q_test / spatial_r_test) read only the keys
their approximation needs; the dict is safe to reuse across calls.

Raises:

AssertionError – If method is not one of ‘clt’, ‘welch’, ‘liu’.

Return type:

dict[str, float | ndarray]

Examples

>>> kernel = MatrixKernel.from_coordinates(coords, method='gaussian')
>>> params = compute_null_params(kernel, method='welch')
>>> Q, pval = spatial_q_test(data, kernel, null_params=params)
>>> R, r_pval = spatial_r_test(x, y, kernel, null_params=params)

Notes

spatial_q_test() standardizes its input as \(Z = (X - \bar{X}\,\mathbf{1}) / \sigma\), so the realized quadratic form is

\[Q \;=\; Z^{\top} K Z \;=\; X^{\top}\, H K H\, X / \sigma^{2}, \qquad H = I - \mathbf{1}\mathbf{1}^{\top} / n.\]

Null moments are therefore for the double-centered operator \(\tilde{K} = H K H\), not raw \(K\). \(Q\) is additionally a ratio of quadratic forms — the denominator \(\sigma^{2} = X^{\top} H X / (n-1)\) is a random variable correlated with the numerator. dirichlet_correction=True applies the finite-\(n\) correction derived from the Dirichlet(1/2) distribution of \(Y_i = X_i^{2} / \sum_j X_j^{2}\):

\[\mathrm{Var}[Q] \;=\; \frac{2 \bigl[\, m \cdot \mathrm{tr}(\tilde K^{2}) - \mathrm{tr}(\tilde K)^{2} \,\bigr]}{m + 2}, \qquad m = n - 1.\]

With dirichlet_correction=False the finite term drops out and \(\mathrm{Var}[Q] = 2\,\mathrm{tr}(\tilde K^{2})\) (large-\(n\) limit).

quadsv.statistics.effective_rank(cov, weights=None)[source]#

Effective rank (participation ratio) of a covariance matrix.

Computes

\[K_\mathrm{eff} \;=\; \frac{\big(\sum_k \lambda_k\big)^2}{\sum_k \lambda_k^2}\]

where \(\lambda_k\) are the eigenvalues of cov (or, when weights is given, of \(W^{1/2} \mathrm{cov}\, W^{1/2}\) with \(W = \mathrm{diag}(w)\)). The result is bounded by 1 (rank-1, all variance on a single direction) and K = cov.shape[0] (uniformly spread, all eigenvalues equal). It coincides with the standard inverse Herfindahl index of the (normalised) eigenvalue distribution and quantifies the “effective number of independent components” of a quadratic-form statistic \(T^2 = X^\top \mathrm{cov} X\) where \(X \sim \mathcal{N}(0,I)\).

Parameters:

cov (np.ndarray) – Symmetric (K, K) covariance matrix. Negative eigenvalues from numerical noise are clipped to 0.
weights (np.ndarray, optional) – Non-negative weights of length K. When provided, returns the effective rank of the weighted form \(W^{1/2} \mathrm{cov}\,W^{1/2}\), useful for analysing how a frequency-weighted L2 statistic actually distributes its sensitivity across eigen-directions.

Returns:

Effective rank in [1, K]. Returns nan when the trace is non-positive (degenerate covariance).

Return type:

float

Examples

>>> import numpy as np
>>> effective_rank(np.eye(10))
10.0
>>> # Rank-1 outer product
>>> v = np.zeros(10); v[0] = 1.0
>>> effective_rank(np.outer(v, v))
1.0

quadsv.statistics.gene_pattern_diversity(spectra, weights=None, *, eps=1e-12)[source]#

Spatial-pattern diversity across genes within a single sample.

Quantifies how heterogeneous the per-gene spatial-frequency profiles are within one sample. Computes the cross-gene covariance of the log-spectra,

\[\hat\Sigma_\mathrm{genes} \;=\; \frac{1}{G - 1} \sum_g \big(\ell_g - \bar\ell\big)\big(\ell_g - \bar\ell\big)^\top\]

where \(\ell_g \in \mathbb{R}^K\) is gene \(g\)’s radially-binned log-spectrum and \(\bar\ell\) the mean across genes, then returns effective_rank(Σ_genes, weights).

Interpretation:

Low diversity (\(K_\mathrm{eff} \approx 1\)): most genes share the same spatial-frequency profile — the sample’s spatial patterns collapse onto a single dominant mode (e.g. all “smooth” or all “punctate”).
High diversity (\(K_\mathrm{eff} \approx K\)): genes vary widely in their spatial structure — the sample carries a rich mix of spatial scales.

Parameters:

spectra (np.ndarray) – (n_genes, K) raw spectrum matrix (typically a single sample’s radially-binned spectrum). Log is taken internally with an eps floor.
weights (np.ndarray, optional) – Per-bin weights, same semantics as effective_rank().
eps (float, default 1e-12) – Floor for log(spectra) to handle exact-zero bins.

Return type:

float

quadsv.statistics.liu_sf(t, lambs, dofs=None, deltas=None, kurtosis=False, n=None)[source]#

Liu approximation to a linear combination of non-central chi-squared variables.

One-shot convenience wrapper equivalent to _liu_apply(t, _liu_prepare(lambs, ...)). For multi-feature workloads, prefer the split form: call _liu_prepare() once on the spectrum and _liu_apply() for each Q-batch (compute_null_params() already caches the coefficients under null_params['liu_coef'], so spatial_q_test() does this automatically).

Parameters:

t (float or np.ndarray) – Test statistic value(s). Array input is broadcast efficiently through a single scipy.stats.ncx2.sf() call.
lambs (np.ndarray) – Eigenvalues of the kernel matrix, shape (n_evals,).
dofs (np.ndarray, optional) – Per-eigenvalue degrees of freedom. Default: ones (chi-squared).
deltas (np.ndarray, optional) – Non-centrality parameters. Default: zeros (central).
kurtosis (bool, default False) – If True, use the kurtosis-based edge-case approximation.
n (int, optional) – Sample size. When provided, applies the Dirichlet(1/2) variance correction Var[Q] = 2·(m·c_2 - c_1²)/(m+2) with m = n-1 for the z-scored ratio Q = XᵀK̃X/σ̂². Essential for broad-spectrum PSD kernels (CAR, graph_laplacian) on dense grids, where the large-\(n\) limit 2·c_2 overestimates Var[Q] and collapses the tail to zero. Default None keeps the original large-\(n\) behavior for back-compat with callers supplying a raw eigenvalue mixture.

Returns:

Tail probability Pr(Q > t) with the same shape as t.

Return type:

float or np.ndarray

quadsv.statistics.spatial_q_test(Xn, kernel, null_params=None, return_pval=True, is_standardized=False, chunk_size='auto', show_progress=False)[source]#

Univariate spatial Q-test for detecting spatial variability.

Top-level chunking wrapper — splits the feature batch along the trailing axis into blocks of chunk_size features, dispatches each block to the backend-specific per-chunk helper (quadsv.kernels.fft._q_test_fft(), quadsv.kernels.nufft._q_test_nufft(), or _q_test_matrix()), and concatenates the results. The per-chunk helpers do not handle chunking themselves.

Parameters:

Xn (np.ndarray or scipy.sparse matrix) – Input data of shape (n,) / (n, M) for MatrixKernel and NUFFTKernel, or (ny, nx) / (ny, nx, M) for FFTKernel. Can be dense numpy array or sparse matrix (CSC/CSR recommended) for MatrixKernel; FFT/NUFFT paths require dense input.
kernel (Kernel) – Pre-constructed Kernel (MatrixKernel / FFTKernel / NUFFTKernel) or a raw dense / sparse kernel matrix.
null_params (dict, optional) – Pre-computed null distribution parameters from compute_null_params(). Resolved once at the top level if None and shared across chunks (no redundant recomputation).
return_pval (bool, default True) – If True, returns (Q, pval); else returns Q only.
is_standardized (bool, default False) – If True, skips Z-score standardization internally.
chunk_size (int or 'auto', default 'auto') – Number of features processed per per-chunk dispatch call. 'auto' defers to auto_chunk_size() (backend-specific cache sweet spot under a 2 GiB live-memory budget); -1 processes the full batch in a single call. For a cross-backend cost model see Scalable Computation.
show_progress (bool, default False) – If True, displays a tqdm bar over chunks (only when M > chunk_size).

Returns:

Q (float or np.ndarray) – Test statistic value(s). Shape (M,) for 2-D / 3-D inputs, scalar for 1-D.
pval (float or np.ndarray, optional) – Tail probability under null hypothesis; returned only if return_pval=True. Same shape as Q.

Return type:

float | ndarray | tuple[float | ndarray, float | ndarray]

Notes

Under H₀: data is spatially independent. Under H₁: mean shift present. The test statistic Q = xᵀ K x approximates a chi-squared mixture under the null; see Theoretical Results and Scalable Computation.

Examples

>>> coords = np.random.randn(100, 2)
>>> kernel = MatrixKernel.from_coordinates(coords, method='gaussian')
>>> data = np.random.randn(100)
>>> Q, pval = spatial_q_test(data, kernel)
>>> # Sparse-matrix batch of features (auto-chunked):
>>> from scipy.sparse import csr_matrix
>>> sparse_data = csr_matrix(np.random.randn(100, 1000))
>>> Q, pval = spatial_q_test(sparse_data, kernel, show_progress=True)

quadsv.statistics.spatial_r_test(Xn, Yn, kernel, null_params=None, return_pval=True, is_standardized=False, chunk_size='auto', show_progress=False)[source]#

Bivariate spatial R-test for correlation between two spatial variables.

Top-level chunking wrapper — splits the paired feature batch along the trailing axis into blocks of chunk_size features, dispatches each block to the backend-specific per-chunk helper (quadsv.kernels.fft._r_test_fft(), quadsv.kernels.nufft._r_test_nufft(), or _r_test_matrix()), and concatenates the results. The per-chunk helpers do not handle chunking themselves.

Parameters:

Xn (np.ndarray or scipy.sparse matrix) – First input. Shape (n,) / (n, M) for MatrixKernel and NUFFTKernel, (ny, nx) / (ny, nx, M) for FFTKernel.
Yn (np.ndarray or scipy.sparse matrix) – Second input, same shape as Xn (paired R-test). For NUFFTKernel a bipartite mode with M_x != M_y is passed through without chunking.
kernel (Kernel) – Pre-constructed Kernel.
null_params (dict, optional) – Pre-computed null parameters; only 'var_R' is consumed. Resolved once at the top level if None and shared across chunks.
return_pval (bool, default True) – If True, returns (R, pval); else returns R only.
is_standardized (bool, default False) – If True, skips Z-score standardization internally.
chunk_size (int or 'auto', default 'auto') – Number of feature pairs processed per per-chunk dispatch call. 'auto' defers to auto_chunk_size(); -1 processes the full batch in a single call. For a cross-backend cost model see Scalable Computation.
show_progress (bool, default False) – If True, displays a tqdm bar over chunks (only when M > chunk_size).

Returns:

R (float or np.ndarray) – Test statistic value(s). Shape (M,) for 2-D / 3-D inputs, scalar for 1-D. For bipartite NUFFT input (M_x != M_y), shape (M_x, M_y).
pval (float or np.ndarray, optional) – Two-tailed tail probability under null hypothesis; returned only if return_pval=True.

Return type:

float | ndarray | tuple[float | ndarray, float | ndarray]

Notes

Under H₀, R = xᵀ K y is approximated as \(\mathcal{N}(0, \mathrm{trace}((HKH)^2))\) on z-scored inputs. See Theoretical Results and Scalable Computation.

Examples

>>> coords = np.random.randn(100, 2)
>>> kernel = MatrixKernel.from_coordinates(coords, method='gaussian')
>>> x_data = np.random.randn(100)
>>> y_data = np.random.randn(100)
>>> R, pval = spatial_r_test(x_data, y_data, kernel)

quadsv.statistics.within_group_pattern_diversity(spectra, groups, weights=None, *, eps=1e-12)[source]#

Spatial-pattern diversity of the within-group residual covariance.

For a cohort of samples partitioned into two groups, computes the pooled-across-genes within-group covariance of log-spectra (the same estimator used by log_l2 + null='wald' in the comparator), then returns its effective rank.

Interpretation:

Low diversity (\(K_\mathrm{eff} \approx 1\)): within-group sample-to-sample variation aligns with one spatial-frequency direction. Wald-type tests on this cohort effectively reduce to a 1-DoF test → high power per direction but very sensitive to estimation noise in that single eigenvalue.
High diversity (\(K_\mathrm{eff} \approx K\)): noise spreads over many directions; Wald’s analytic null is more accurate (CLT smoothing).

Parameters:

spectra (np.ndarray) – (n_samples, n_genes, K) spectrum tensor (raw, not logged).
groups (np.ndarray) – (n_samples,) with exactly two distinct labels.
weights (np.ndarray, optional) – Per-bin weights, same semantics as effective_rank().
eps (float, default 1e-12) – Floor for log(spectra).

Returns:

Effective rank of the pooled within-group covariance.

Return type:

float

Backend	chunk cap
`FFTKernel`	32
`NUFFTKernel`	64
MatrixKernel (any sub-type)	16 (`n < 200k`); 8 (`n ≥ 200k`)