quadsv.statistics#

Statistical testing framework (Q-tests, R-tests, null approximations).

quadsv.statistics.liu_sf(t: float | ndarray, lambs: ndarray, dofs: ndarray | None = None, deltas: ndarray | None = None, kurtosis: bool = False) → float | ndarray[source]#

Liu approximation to linear combination of noncentral chi-squared variables.

Approximates the tail probability Pr(Q > t) for a weighted sum of noncentral chi-squared random variables. This is the default p-value computation method when exact kernel eigenvalues are known.

Parameters:

t (float or np.ndarray) – Test statistic value(s). Can be scalar or array.
lambs (np.ndarray) – Eigenvalues of the kernel matrix, shape (n_evals,).
dofs (np.ndarray, optional) – Degrees of freedom for each eigenvalue. Default: ones (chi-squared).
deltas (np.ndarray, optional) – Non-centrality parameters. Default: zeros (central chi-squared).
kurtosis (bool, default False) – If True, uses kurtosis-based approximation for edge case.

Returns:

Tail probability Pr(Q > t). Same shape as input t.

Return type:

float or np.ndarray

Notes

Uses moment-based approximation with chi-squared mixture distribution. Numerically stable for a wide range of eigenvalue spectra.

quadsv.statistics.compute_null_params(kernel: Kernel, method: str = 'welch', k_eigen: int | None = None) → dict[str, float | ndarray][source]#

Pre-compute null distribution parameters for spatial tests.

Call this ONCE before running parallel tests on thousands of features. Caches the expensive computations (traces, eigenvalues) for reuse.

Parameters:

kernel (Kernel) – The spatial kernel object (SpatialKernel, FFTKernel, or compatible).
method ({'clt', 'welch', 'liu'}, default 'welch') – Null approximation method: - ‘clt’: Central Limit Theorem (Z-score normal approximation) - ‘welch’: Welch-Satterthwaite moment matching (fast, uses traces) - ‘liu’: Liu eigenvalue-based approximation (accurate tail, slower)
k_eigen (int, optional) – Number of top eigenvalues to compute if method=’liu’ and kernel is sparse. If None, computes all available eigenvalues.

Returns:

Parameters keyed by null_approx method: - ‘method’: The method used - For ‘liu’: ‘eigenvalues’ (np.ndarray of kernel eigenvalues) - For ‘welch’/’clt’: ‘mean_Q’, ‘var_Q’, and for ‘welch’ also ‘scale_g’, ‘df_h’

Return type:

dict[str, Union[float, np.ndarray]]

Raises:

AssertionError – If method is not one of ‘clt’, ‘welch’, ‘liu’.

Examples

>>> kernel = SpatialKernel.from_coordinates(coords, method='gaussian')
>>> params = compute_null_params(kernel, method='welch')
>>> Q, pval = spatial_q_test(data, kernel, null_params=params)

Univariate spatial Q-test for detecting spatial variability.

Tests whether a spatial variable exhibits significant clustering or dispersion using the specified kernel weighting scheme. Supports both single features and batch processing with sparse matrices.

Parameters:

Xn (np.ndarray or scipy.sparse matrix) – Input data array of shape (N,) for single feature or (N, M) for M features. Can be dense numpy array or sparse matrix (CSC/CSR format recommended). Should be standardized before calling unless is_standardized=True.
kernel (Kernel) – Pre-constructed kernel object (Kernel, SpatialKernel, FFTKernel, or scipy.sparse matrix).
null_params (dict, optional) – Pre-computed null distribution parameters from compute_null_params(). If None, computed on-the-fly using ‘welch’ method (only accurate when kernel is positive semi-definite).
return_pval (bool, default True) – If True, returns (Q, pval) tuple; if False, returns Q only.
is_standardized (bool, default False) – If True, skips Z-score standardization internally (assumes input is N(0,1)).
chunk_size (int, default -1) – Number of features to process in each chunk. If -1, processes all features at once. Useful for large feature sets to reduce memory usage. Must be <= M.
show_progress (bool, default False) – If True, displays a progress bar during chunk processing.

Returns:

Q (float or np.ndarray) – Test statistic value(s). Shape (M,) if input was 2D, scalar if input was 1D.
pval (float or np.ndarray, optional) – Tail probability under null hypothesis. Only returned if return_pval=True. Same shape as Q.

Raises:

ValueError – If kernel dimensions don’t match data size or if params is None and kernel is not a Kernel object.

Notes

Under H₀: data is spatially independent. Under H₁: mean-shift present.

The test statistic Q = x^T K x where K is the kernel matrix, follows approximately a chi-squared mixture distribution:

$$Q sim sum_{i=1}^{n} lambda_i chi^2_{1}$$

where $lambda_i$ are the kernel eigenvalues.

By default, we approximate the null using Welch-Satterthwaite moment matching. For more accurate tail probabilities, set null_params = {‘method’: ‘liu’} or using null_params = compute_null_params(method = ‘liu’).

Examples

>>> coords = np.random.randn(100, 2)
>>> kernel = SpatialKernel.from_coordinates(coords, method='gaussian')
>>> data = np.random.randn(100)
>>> Q, pval = spatial_q_test(data, kernel)
>>> # Sparse matrix example
>>> from scipy.sparse import csr_matrix
>>> sparse_data = csr_matrix(np.random.randn(100, 1000))
>>> Q, pval = spatial_q_test(sparse_data, kernel, chunk_size=100, show_progress=True)

Bivariate spatial R-test for correlation between two spatial variables.

Computes the pairwise spatial statistic R = x^T K y, testing for spatial association between two variables. Supports batch processing.

Parameters:

Xn (np.ndarray) – First input data vector or batch. Shape (N,) or (N, M).
Yn (np.ndarray) – Second input data vector or batch. Shape (N,) or (N, M) matching Xn.
kernel (Kernel) – Pre-constructed kernel object compatible with xtKy() method.
null_params (dict, optional) – Pre-computed null distribution parameters. Should include ‘var_R’. If None, computed on-the-fly from kernel traces.
return_pval (bool, default True) – If True, returns (R, pval) tuple; if False, returns R only.
is_standardized (bool, default False) – If True, skips Z-score standardization internally.

Returns:

R (float or np.ndarray) – Test statistic value(s). Shape (M,) if input was 2D, scalar if input was 1D.
pval (float or np.ndarray, optional) – Tail probability under null hypothesis (two-tailed test). Only returned if return_pval=True. Based on Normal approximation.

Raises:

ValueError – If Xn and Yn shapes don’t match or kernel dimensions are incompatible.

Notes

Under H₀: the two variables are spatially uncorrelated.

The test statistic R = x^T K y is approximated as Normal under the null:

$$R sim N(0, text{Trace}(K^2))$$

P-value is computed as two-tailed: 2 × Pr(|R| > |r_obs|).

Examples

>>> coords = np.random.randn(100, 2)
>>> kernel = SpatialKernel.from_coordinates(coords, method='gaussian')
>>> x_data = np.random.randn(100)
>>> y_data = np.random.randn(100)
>>> R, pval = spatial_r_test(x_data, y_data, kernel)

Null approximation methods#

Three strategies available via null_approx parameter:

clt: O(N) via Hutchinson trace. For indefinite kernels and Z-scores.
welch: O(N) via Hutchinson trace. Gamma moment matching. Default for large N.
liu: O(N³) eigendecomposition. 4-moment weighted chi-square. Most accurate for N < 5000 or FFT grids.

See compute_null_params() for details.