quadsv.statistics#
Statistical testing framework (Q-tests, R-tests, null approximations).
- quadsv.statistics.liu_sf(t: float | ndarray, lambs: ndarray, dofs: ndarray | None = None, deltas: ndarray | None = None, kurtosis: bool = False) float | ndarray[source]#
Liu approximation to linear combination of noncentral chi-squared variables.
Approximates the tail probability Pr(Q > t) for a weighted sum of noncentral chi-squared random variables. This is the default p-value computation method when exact kernel eigenvalues are known.
- Parameters:
t (float or np.ndarray) – Test statistic value(s). Can be scalar or array.
lambs (np.ndarray) – Eigenvalues of the kernel matrix, shape (n_evals,).
dofs (np.ndarray, optional) – Degrees of freedom for each eigenvalue. Default: ones (chi-squared).
deltas (np.ndarray, optional) – Non-centrality parameters. Default: zeros (central chi-squared).
kurtosis (bool, default False) – If True, uses kurtosis-based approximation for edge case.
- Returns:
Tail probability Pr(Q > t). Same shape as input t.
- Return type:
float or np.ndarray
Notes
Uses moment-based approximation with chi-squared mixture distribution. Numerically stable for a wide range of eigenvalue spectra.
- quadsv.statistics.compute_null_params(kernel: Kernel, method: str = 'welch', k_eigen: int | None = None) dict[str, float | ndarray][source]#
Pre-compute null distribution parameters for spatial tests.
Call this ONCE before running parallel tests on thousands of features. Caches the expensive computations (traces, eigenvalues) for reuse.
- Parameters:
kernel (Kernel) – The spatial kernel object (SpatialKernel, FFTKernel, or compatible).
method ({'clt', 'welch', 'liu'}, default 'welch') – Null approximation method: - ‘clt’: Central Limit Theorem (Z-score normal approximation) - ‘welch’: Welch-Satterthwaite moment matching (fast, uses traces) - ‘liu’: Liu eigenvalue-based approximation (accurate tail, slower)
k_eigen (int, optional) – Number of top eigenvalues to compute if method=’liu’ and kernel is sparse. If None, computes all available eigenvalues.
- Returns:
Parameters keyed by null_approx method: - ‘method’: The method used - For ‘liu’: ‘eigenvalues’ (np.ndarray of kernel eigenvalues) - For ‘welch’/’clt’: ‘mean_Q’, ‘var_Q’, and for ‘welch’ also ‘scale_g’, ‘df_h’
- Return type:
- Raises:
AssertionError – If method is not one of ‘clt’, ‘welch’, ‘liu’.
Examples
>>> kernel = SpatialKernel.from_coordinates(coords, method='gaussian') >>> params = compute_null_params(kernel, method='welch') >>> Q, pval = spatial_q_test(data, kernel, null_params=params)
- quadsv.statistics.spatial_q_test(Xn: ndarray | spmatrix, kernel: Kernel, null_params: dict | None = None, return_pval: bool = True, is_standardized: bool = False, chunk_size: int = -1, show_progress: bool = False) float | ndarray | Tuple[float | ndarray, float | ndarray][source]#
Univariate spatial Q-test for detecting spatial variability.
Tests whether a spatial variable exhibits significant clustering or dispersion using the specified kernel weighting scheme. Supports both single features and batch processing with sparse matrices.
- Parameters:
Xn (np.ndarray or scipy.sparse matrix) – Input data array of shape (N,) for single feature or (N, M) for M features. Can be dense numpy array or sparse matrix (CSC/CSR format recommended). Should be standardized before calling unless is_standardized=True.
kernel (Kernel) – Pre-constructed kernel object (Kernel, SpatialKernel, FFTKernel, or scipy.sparse matrix).
null_params (dict, optional) – Pre-computed null distribution parameters from compute_null_params(). If None, computed on-the-fly using ‘welch’ method (only accurate when kernel is positive semi-definite).
return_pval (bool, default True) – If True, returns (Q, pval) tuple; if False, returns Q only.
is_standardized (bool, default False) – If True, skips Z-score standardization internally (assumes input is N(0,1)).
chunk_size (int, default -1) – Number of features to process in each chunk. If -1, processes all features at once. Useful for large feature sets to reduce memory usage. Must be <= M.
show_progress (bool, default False) – If True, displays a progress bar during chunk processing.
- Returns:
Q (float or np.ndarray) – Test statistic value(s). Shape (M,) if input was 2D, scalar if input was 1D.
pval (float or np.ndarray, optional) – Tail probability under null hypothesis. Only returned if return_pval=True. Same shape as Q.
- Raises:
ValueError – If kernel dimensions don’t match data size or if params is None and kernel is not a Kernel object.
Notes
Under H₀: data is spatially independent. Under H₁: mean-shift present.
The test statistic Q = x^T K x where K is the kernel matrix, follows approximately a chi-squared mixture distribution:
$$Q sim sum_{i=1}^{n} lambda_i chi^2_{1}$$
where $lambda_i$ are the kernel eigenvalues.
By default, we approximate the null using Welch-Satterthwaite moment matching. For more accurate tail probabilities, set null_params = {‘method’: ‘liu’} or using null_params = compute_null_params(method = ‘liu’).
Examples
>>> coords = np.random.randn(100, 2) >>> kernel = SpatialKernel.from_coordinates(coords, method='gaussian') >>> data = np.random.randn(100) >>> Q, pval = spatial_q_test(data, kernel) >>> # Sparse matrix example >>> from scipy.sparse import csr_matrix >>> sparse_data = csr_matrix(np.random.randn(100, 1000)) >>> Q, pval = spatial_q_test(sparse_data, kernel, chunk_size=100, show_progress=True)
- quadsv.statistics.spatial_r_test(Xn: ndarray, Yn: ndarray, kernel: Kernel, null_params: dict | None = None, return_pval: bool = True, is_standardized: bool = False) float | ndarray | Tuple[float | ndarray, float | ndarray][source]#
Bivariate spatial R-test for correlation between two spatial variables.
Computes the pairwise spatial statistic R = x^T K y, testing for spatial association between two variables. Supports batch processing.
- Parameters:
Xn (np.ndarray) – First input data vector or batch. Shape (N,) or (N, M).
Yn (np.ndarray) – Second input data vector or batch. Shape (N,) or (N, M) matching Xn.
kernel (Kernel) – Pre-constructed kernel object compatible with xtKy() method.
null_params (dict, optional) – Pre-computed null distribution parameters. Should include ‘var_R’. If None, computed on-the-fly from kernel traces.
return_pval (bool, default True) – If True, returns (R, pval) tuple; if False, returns R only.
is_standardized (bool, default False) – If True, skips Z-score standardization internally.
- Returns:
R (float or np.ndarray) – Test statistic value(s). Shape (M,) if input was 2D, scalar if input was 1D.
pval (float or np.ndarray, optional) – Tail probability under null hypothesis (two-tailed test). Only returned if return_pval=True. Based on Normal approximation.
- Raises:
ValueError – If Xn and Yn shapes don’t match or kernel dimensions are incompatible.
Notes
Under H₀: the two variables are spatially uncorrelated.
The test statistic R = x^T K y is approximated as Normal under the null:
$$R sim N(0, text{Trace}(K^2))$$
P-value is computed as two-tailed: 2 × Pr(|R| > |r_obs|).
Examples
>>> coords = np.random.randn(100, 2) >>> kernel = SpatialKernel.from_coordinates(coords, method='gaussian') >>> x_data = np.random.randn(100) >>> y_data = np.random.randn(100) >>> R, pval = spatial_r_test(x_data, y_data, kernel)
Null approximation methods#
Three strategies available via null_approx parameter:
clt: O(N) via Hutchinson trace. For indefinite kernels and Z-scores.
welch: O(N) via Hutchinson trace. Gamma moment matching. Default for large N.
liu: O(N³) eigendecomposition. 4-moment weighted chi-square. Most accurate for N < 5000 or FFT grids.
See compute_null_params() for details.