quadsv.statistics#
Functions#
|
Pick a per-backend-optimal |
|
Pre-compute null distribution parameters for spatial tests. |
|
Effective rank (participation ratio) of a covariance matrix. |
|
Spatial-pattern diversity across genes within a single sample. |
|
Liu approximation to a linear combination of non-central chi-squared variables. |
|
Univariate spatial Q-test for detecting spatial variability. |
|
Bivariate spatial R-test for correlation between two spatial variables. |
|
Spatial-pattern diversity of the within-group residual covariance. |
Module Contents#
- quadsv.statistics.auto_chunk_size(kernel, n_jobs=1, budget_bytes=_DEFAULT_CHUNK_BUDGET)[source]#
Pick a per-backend-optimal
chunk_sizefor the Q / R test.The returned value is used by
spatial_q_test()/spatial_r_test()(and byDetectorGrid.compute_qstat()/DetectorIrregular.compute_qstat()) to split a multi-feature batch into chunks. It is the smaller of two caps:Cache sweet-spot cap — empirical sweep of per-feature time vs
chunkatn ∈ {30k, 100k, 300k, 1M}:Backend
chunk cap
FFTKernel32
NUFFTKernel64
MatrixKernel (any sub-type)
16 (
n < 200k); 8 (n ≥ 200k)Matrix backends don’t vectorise over RHS columns (scipy CSR SpMV, SuperLU triangular solve), and the chunk size cap is determined empirically for best per-feature speed under the given memory constraints. FFT / NUFFT do benefit from BLAS /
n_transfbatching, but their complex workspace spills L3 past the listed cap (15× slowdown for FFT at chunk=512, 1.9× for NUFFT at chunk=256).Memory cap —
budget_bytes / n_jobs // per_feat, whereper_featis the backend-specific transient bytes per feature:MatrixKernel dense / sparse:
16 · nMatrixKernel precision-stored CAR:
24 · nFFTKernel:
24 · nNUFFTKernel:
16 · ny·nx + 8 · n
- Parameters:
kernel (Kernel) – The backend kernel the chunk will operate on.
n_jobs (int, default 1) – Number of parallel workers the caller plans to use. The
budget_bytesis divided byn_jobsso aggregate live memory stays bounded.budget_bytes (int, default 2 GiB) – Aggregate live-memory cap across all workers.
- Returns:
A
chunk_sizein[8, chunk_cap]. The lower bound of 8 ensures measurements stay meaningful even when a single feature consumes most of the per-worker budget.- Return type:
int
- quadsv.statistics.compute_null_params(kernel, method='welch', k_eigen=None, dirichlet_correction=True, liu_n_probes=None)[source]#
Pre-compute null distribution parameters for spatial tests.
Call this ONCE before running parallel tests on thousands of features. Caches the expensive computations (traces, cumulants, shifted-χ² fit) for reuse across both Q-tests and R-tests.
- Parameters:
kernel (Kernel) – The spatial kernel object (MatrixKernel, FFTKernel, NUFFTKernel, or compatible).
method ({'clt', 'welch', 'liu'}, default 'welch') –
Null approximation method for the Q-test. The R-test entry
var_R = trace(K̃²)is always populated alongside, regardless ofmethod— R-tests use a Normal approximation and only need this one moment.’clt’: Central Limit Theorem (Z-score normal approximation)
’welch’: Welch-Satterthwaite moment matching (fast, uses traces)
’liu’: Liu eigenvalue-based approximation (accurate tail, slower)
k_eigen (int, optional) – Number of top eigenvalues to compute if method=’liu’ and kernel is sparse. If None, computes all available eigenvalues. Ignored when
liu_n_probesis set (eigenvalues are bypassed entirely).liu_n_probes (int, optional) – If set, bypass the eigendecomposition for
method='liu'and estimate the four spectral cumulantsc_p = trace(K̃^p),p = 1..4, directly from the kernel via Hutchinson probing (_hutchinson_cumulants()). Cost drops from \(O(n^3)\) (dense eigensolve) or \(O(r^3)\) (reduced Toeplitz-M) to \(2 \cdot n_\mathrm{probes}\) kernel matvecs, at the cost of \(O(n_\mathrm{probes}^{-1/2})\) Monte-Carlo error in each cumulant. Rule of thumb:n_probes = 60gives Liu p-values within~5\%of the eigenvalue-exact baseline;n_probes = 120within~0.2\%. WhenNone(default), the eigenvalue path is used.dirichlet_correction (bool, default True) – When
True: use the finite-nDirichlet(1/2) ratioVar[Q] = 2 · (m · trace((HKH)²) - trace(HKH)²) / (m+2)withm = n-1. WhenFalse: drop themean²term to the large-nlimitVar[Q] = 2 · trace((HKH)²), a monotonic upper bound that slightly overestimatesVar[Q]at finiten. See Notes for the derivation.
- Returns:
dict[str, float or np.ndarray] – Always populated (regardless of
method):'method': str — the Q-test approximation selected.'var_R': float —trace(K̃²), the null variance ofR(used byspatial_r_test()).
Method-specific additions:
method='liu'(default for FFT / NUFFT kernels):'cumulants':dict {1: c_1, 2: c_2, 3: c_3, 4: c_4}withc_p = trace(K̃^p). Computed from the full eigendecomposition when available (liu_n_probes is None) or from \(2m\) Hutchinson probes otherwise.'liu_coef':dictwith cached Liu coefficients{'mu_Q', 'sigma_Q', 'mu_x', 'sigma_x', 'dof_x', 'delta_x'}derived fromcumulantsonce; consumed byspatial_q_test()so per-feature p-values reduce to a pure \(t\)-broadcast.
method='welch'(default for MatrixKernel Q-tests):'mean_Q':trace(K)'var_Q':2 · trace(K²)'scale_g': Welch scale parametervar_Q / (2 · mean_Q)'df_h': Welch df2 · mean_Q² / var_Q
method='clt':'mean_Q','var_Q'only.
Consumers (
spatial_q_test/spatial_r_test) read only the keystheir approximation needs; the dict is safe to reuse across calls.
- Raises:
AssertionError – If method is not one of ‘clt’, ‘welch’, ‘liu’.
- Return type:
dict[str, float | ndarray]
Examples
>>> kernel = MatrixKernel.from_coordinates(coords, method='gaussian') >>> params = compute_null_params(kernel, method='welch') >>> Q, pval = spatial_q_test(data, kernel, null_params=params) >>> R, r_pval = spatial_r_test(x, y, kernel, null_params=params)
Notes
spatial_q_test()standardizes its input as \(Z = (X - \bar{X}\,\mathbf{1}) / \sigma\), so the realized quadratic form is\[Q \;=\; Z^{\top} K Z \;=\; X^{\top}\, H K H\, X / \sigma^{2}, \qquad H = I - \mathbf{1}\mathbf{1}^{\top} / n.\]Null moments are therefore for the double-centered operator \(\tilde{K} = H K H\), not raw \(K\). \(Q\) is additionally a ratio of quadratic forms — the denominator \(\sigma^{2} = X^{\top} H X / (n-1)\) is a random variable correlated with the numerator.
dirichlet_correction=Trueapplies the finite-\(n\) correction derived from the Dirichlet(1/2) distribution of \(Y_i = X_i^{2} / \sum_j X_j^{2}\):\[\mathrm{Var}[Q] \;=\; \frac{2 \bigl[\, m \cdot \mathrm{tr}(\tilde K^{2}) - \mathrm{tr}(\tilde K)^{2} \,\bigr]}{m + 2}, \qquad m = n - 1.\]With
dirichlet_correction=Falsethe finite term drops out and \(\mathrm{Var}[Q] = 2\,\mathrm{tr}(\tilde K^{2})\) (large-\(n\) limit).
- quadsv.statistics.effective_rank(cov, weights=None)[source]#
Effective rank (participation ratio) of a covariance matrix.
Computes
\[K_\mathrm{eff} \;=\; \frac{\big(\sum_k \lambda_k\big)^2}{\sum_k \lambda_k^2}\]where \(\lambda_k\) are the eigenvalues of
cov(or, whenweightsis given, of \(W^{1/2} \mathrm{cov}\, W^{1/2}\) with \(W = \mathrm{diag}(w)\)). The result is bounded by 1 (rank-1, all variance on a single direction) andK = cov.shape[0](uniformly spread, all eigenvalues equal). It coincides with the standard inverse Herfindahl index of the (normalised) eigenvalue distribution and quantifies the “effective number of independent components” of a quadratic-form statistic \(T^2 = X^\top \mathrm{cov} X\) where \(X \sim \mathcal{N}(0,I)\).- Parameters:
cov (np.ndarray) – Symmetric
(K, K)covariance matrix. Negative eigenvalues from numerical noise are clipped to 0.weights (np.ndarray, optional) – Non-negative weights of length
K. When provided, returns the effective rank of the weighted form \(W^{1/2} \mathrm{cov}\,W^{1/2}\), useful for analysing how a frequency-weighted L2 statistic actually distributes its sensitivity across eigen-directions.
- Returns:
Effective rank in
[1, K]. Returnsnanwhen the trace is non-positive (degenerate covariance).- Return type:
float
Examples
>>> import numpy as np >>> effective_rank(np.eye(10)) 10.0 >>> # Rank-1 outer product >>> v = np.zeros(10); v[0] = 1.0 >>> effective_rank(np.outer(v, v)) 1.0
- quadsv.statistics.gene_pattern_diversity(spectra, weights=None, *, eps=1e-12)[source]#
Spatial-pattern diversity across genes within a single sample.
Quantifies how heterogeneous the per-gene spatial-frequency profiles are within one sample. Computes the cross-gene covariance of the log-spectra,
\[\hat\Sigma_\mathrm{genes} \;=\; \frac{1}{G - 1} \sum_g \big(\ell_g - \bar\ell\big)\big(\ell_g - \bar\ell\big)^\top\]where \(\ell_g \in \mathbb{R}^K\) is gene \(g\)’s radially-binned log-spectrum and \(\bar\ell\) the mean across genes, then returns
effective_rank(Σ_genes, weights).Interpretation:
Low diversity (\(K_\mathrm{eff} \approx 1\)): most genes share the same spatial-frequency profile — the sample’s spatial patterns collapse onto a single dominant mode (e.g. all “smooth” or all “punctate”).
High diversity (\(K_\mathrm{eff} \approx K\)): genes vary widely in their spatial structure — the sample carries a rich mix of spatial scales.
- Parameters:
spectra (np.ndarray) –
(n_genes, K)raw spectrum matrix (typically a single sample’s radially-binned spectrum). Log is taken internally with anepsfloor.weights (np.ndarray, optional) – Per-bin weights, same semantics as
effective_rank().eps (float, default 1e-12) – Floor for
log(spectra)to handle exact-zero bins.
- Return type:
float
- quadsv.statistics.liu_sf(t, lambs, dofs=None, deltas=None, kurtosis=False, n=None)[source]#
Liu approximation to a linear combination of non-central chi-squared variables.
One-shot convenience wrapper equivalent to
_liu_apply(t, _liu_prepare(lambs, ...)). For multi-feature workloads, prefer the split form: call_liu_prepare()once on the spectrum and_liu_apply()for each Q-batch (compute_null_params()already caches the coefficients undernull_params['liu_coef'], sospatial_q_test()does this automatically).- Parameters:
t (float or np.ndarray) – Test statistic value(s). Array input is broadcast efficiently through a single
scipy.stats.ncx2.sf()call.lambs (np.ndarray) – Eigenvalues of the kernel matrix, shape
(n_evals,).dofs (np.ndarray, optional) – Per-eigenvalue degrees of freedom. Default: ones (chi-squared).
deltas (np.ndarray, optional) – Non-centrality parameters. Default: zeros (central).
kurtosis (bool, default False) – If True, use the kurtosis-based edge-case approximation.
n (int, optional) – Sample size. When provided, applies the Dirichlet(1/2) variance correction
Var[Q] = 2·(m·c_2 - c_1²)/(m+2)withm = n-1for the z-scored ratioQ = XᵀK̃X/σ̂². Essential for broad-spectrum PSD kernels (CAR, graph_laplacian) on dense grids, where the large-\(n\) limit2·c_2overestimatesVar[Q]and collapses the tail to zero. DefaultNonekeeps the original large-\(n\) behavior for back-compat with callers supplying a raw eigenvalue mixture.
- Returns:
Tail probability
Pr(Q > t)with the same shape ast.- Return type:
float or np.ndarray
- quadsv.statistics.spatial_q_test(Xn, kernel, null_params=None, return_pval=True, is_standardized=False, chunk_size='auto', show_progress=False)[source]#
Univariate spatial Q-test for detecting spatial variability.
Top-level chunking wrapper — splits the feature batch along the trailing axis into blocks of
chunk_sizefeatures, dispatches each block to the backend-specific per-chunk helper (quadsv.kernels.fft._q_test_fft(),quadsv.kernels.nufft._q_test_nufft(), or_q_test_matrix()), and concatenates the results. The per-chunk helpers do not handle chunking themselves.- Parameters:
Xn (np.ndarray or scipy.sparse matrix) – Input data of shape
(n,)/(n, M)for MatrixKernel and NUFFTKernel, or(ny, nx)/(ny, nx, M)for FFTKernel. Can be dense numpy array or sparse matrix (CSC/CSR recommended) for MatrixKernel; FFT/NUFFT paths require dense input.kernel (Kernel) – Pre-constructed
Kernel(MatrixKernel/FFTKernel/NUFFTKernel) or a raw dense / sparse kernel matrix.null_params (dict, optional) – Pre-computed null distribution parameters from
compute_null_params(). Resolved once at the top level ifNoneand shared across chunks (no redundant recomputation).return_pval (bool, default True) – If True, returns
(Q, pval); else returnsQonly.is_standardized (bool, default False) – If True, skips Z-score standardization internally.
chunk_size (int or
'auto', default'auto') – Number of features processed per per-chunk dispatch call.'auto'defers toauto_chunk_size()(backend-specific cache sweet spot under a 2 GiB live-memory budget);-1processes the full batch in a single call. For a cross-backend cost model see Scalable Computation.show_progress (bool, default False) – If True, displays a tqdm bar over chunks (only when
M > chunk_size).
- Returns:
Q (float or np.ndarray) – Test statistic value(s). Shape
(M,)for 2-D / 3-D inputs, scalar for 1-D.pval (float or np.ndarray, optional) – Tail probability under null hypothesis; returned only if
return_pval=True. Same shape as Q.
- Return type:
Notes
Under H₀: data is spatially independent. Under H₁: mean shift present. The test statistic
Q = xᵀ K xapproximates a chi-squared mixture under the null; see Theoretical Results and Scalable Computation.Examples
>>> coords = np.random.randn(100, 2) >>> kernel = MatrixKernel.from_coordinates(coords, method='gaussian') >>> data = np.random.randn(100) >>> Q, pval = spatial_q_test(data, kernel) >>> # Sparse-matrix batch of features (auto-chunked): >>> from scipy.sparse import csr_matrix >>> sparse_data = csr_matrix(np.random.randn(100, 1000)) >>> Q, pval = spatial_q_test(sparse_data, kernel, show_progress=True)
- quadsv.statistics.spatial_r_test(Xn, Yn, kernel, null_params=None, return_pval=True, is_standardized=False, chunk_size='auto', show_progress=False)[source]#
Bivariate spatial R-test for correlation between two spatial variables.
Top-level chunking wrapper — splits the paired feature batch along the trailing axis into blocks of
chunk_sizefeatures, dispatches each block to the backend-specific per-chunk helper (quadsv.kernels.fft._r_test_fft(),quadsv.kernels.nufft._r_test_nufft(), or_r_test_matrix()), and concatenates the results. The per-chunk helpers do not handle chunking themselves.- Parameters:
Xn (np.ndarray or scipy.sparse matrix) – First input. Shape
(n,)/(n, M)for MatrixKernel and NUFFTKernel,(ny, nx)/(ny, nx, M)for FFTKernel.Yn (np.ndarray or scipy.sparse matrix) – Second input, same shape as
Xn(paired R-test). ForNUFFTKernela bipartite mode withM_x != M_yis passed through without chunking.kernel (Kernel) – Pre-constructed
Kernel.null_params (dict, optional) – Pre-computed null parameters; only
'var_R'is consumed. Resolved once at the top level ifNoneand shared across chunks.return_pval (bool, default True) – If True, returns
(R, pval); else returnsRonly.is_standardized (bool, default False) – If True, skips Z-score standardization internally.
chunk_size (int or
'auto', default'auto') – Number of feature pairs processed per per-chunk dispatch call.'auto'defers toauto_chunk_size();-1processes the full batch in a single call. For a cross-backend cost model see Scalable Computation.show_progress (bool, default False) – If True, displays a tqdm bar over chunks (only when
M > chunk_size).
- Returns:
R (float or np.ndarray) – Test statistic value(s). Shape
(M,)for 2-D / 3-D inputs, scalar for 1-D. For bipartite NUFFT input (M_x != M_y), shape(M_x, M_y).pval (float or np.ndarray, optional) – Two-tailed tail probability under null hypothesis; returned only if
return_pval=True.
- Return type:
Notes
Under H₀,
R = xᵀ K yis approximated as \(\mathcal{N}(0, \mathrm{trace}((HKH)^2))\) on z-scored inputs. See Theoretical Results and Scalable Computation.Examples
>>> coords = np.random.randn(100, 2) >>> kernel = MatrixKernel.from_coordinates(coords, method='gaussian') >>> x_data = np.random.randn(100) >>> y_data = np.random.randn(100) >>> R, pval = spatial_r_test(x_data, y_data, kernel)
- quadsv.statistics.within_group_pattern_diversity(spectra, groups, weights=None, *, eps=1e-12)[source]#
Spatial-pattern diversity of the within-group residual covariance.
For a cohort of samples partitioned into two groups, computes the pooled-across-genes within-group covariance of log-spectra (the same estimator used by
log_l2 + null='wald'in the comparator), then returns its effective rank.Interpretation:
Low diversity (\(K_\mathrm{eff} \approx 1\)): within-group sample-to-sample variation aligns with one spatial-frequency direction. Wald-type tests on this cohort effectively reduce to a 1-DoF test → high power per direction but very sensitive to estimation noise in that single eigenvalue.
High diversity (\(K_\mathrm{eff} \approx K\)): noise spreads over many directions; Wald’s analytic null is more accurate (CLT smoothing).
- Parameters:
spectra (np.ndarray) –
(n_samples, n_genes, K)spectrum tensor (raw, not logged).groups (np.ndarray) –
(n_samples,)with exactly two distinct labels.weights (np.ndarray, optional) – Per-bin weights, same semantics as
effective_rank().eps (float, default 1e-12) – Floor for
log(spectra).
- Returns:
Effective rank of the pooled within-group covariance.
- Return type:
float