Theoretical Results#
In our accompanied paper, we demonstrate that virtually all major spatial variable gene (SVG) detection methods, including graph-based ones like Moran’s I, parametric models, and non-parametric dependence tests, reduce to a single quadratic-form statistic (Q-statistic),
where \(\mathbf{z}\) is the standardized gene expression vector and \(\mathbf{K}\) is a kernel matrix encoding spatial structure. Under the null hypothesis of spatial independence, \(Q_n\) follows a weighted chi-square distribution whose weights are the eigenvalues of \(\mathbf{K}\). We can approximate the null distribution using moment-matching methods to compute p-values efficiently, yielding a Q-test. The choice of kernel \(\mathbf{K}\) critically affects the consistency and power of the resulting Q-test.
Here we summarize key theoretical results underpinning the quadratic form.
Theorems#
Theorem 1: Q-tests detect mean shifts only
All spatial Q-tests detect mean-shift patterns (\(\mathbb{E}[\mathbf{x}|S=\mathbf{s}] \neq \mathbb{E}[\mathbf{x}]\)).
This is the direct results of using a linear kernel \(l(x_i, x_j) = x_i x_j\) in the quadratic form that reduces the conditional \(X|S=s_i\) to its mean. If investigating higher-order spatial moments (variance, distributional changes), use non-linear kernels such as Gaussian or polynomial kernels (e.g., \(Q_n = (\mathbf{z}^2)^\top\mathbf{K} \mathbf{z}^2\)).
However, in applications such as spatial transcriptomics, this distributional information is absent because we observe only a single realization \((x_i, s_i)\) drawn from \(X \mid S=s_i\) at each location. This constraint blurs the line between mean independence and statistical independence. Motivated by this observation, we adopt a functional perspective, treating the “signal” as a deterministic element of a Hilbert space \(f \in L^2(\mathcal{S})\). Using the spectrum theory of kernel operators, we derive the following condition for test consistency regarding mean dependence.
Theorem 2: Consistency requires positive definiteness
A spatial Q-test is universally consistent (power → 1 as \(N \to \infty\)) to all non-constant (deterministic) patterns if and only if the kernel \(\mathbf{K}\) is strictly positive definite.
Under \(H_0\), \(Q_n\) approximates a weighted chi-square: \(Q_n \sim \sum_i \lambda_i \chi^2_1\). If \(\lambda_i < 0\) (indefinite kernel), negative eigenspace signals cancel positive signals (spectral cancellation), reducing test power.
Implication: Choose kernels with positive eigenvalues:
Kernel |
Spectrum |
Consistency |
Gaussian |
Positive definite |
✓ Guaranteed |
Matérn |
Positive definite |
✓ Guaranteed |
Moran’s I |
Indefinite |
✗ Spectral cancellation |
Laplacian |
Semi-definite |
✓ Guaranteed (high-frequency Moran’s I) |
CAR (inverse Laplacian) |
Positive definite |
✓ Guaranteed (Low-frequency Moran’s I) |
CAR is a scalable correction to Moran’s I#
The Conditional Autoregressive (CAR) kernel provides strict positive definiteness
where:
\(\tilde{\mathbf{W}}\) is the row-normalized adjacency matrix
\(0 < \rho < 1\) is the autoregressive parameter (default: 0.9)
\((\mathbf{I} - \rho \tilde{\mathbf{W}})\) is the CAR precision matrix
Key properties:
Strictly positive definite for all \(0 < \rho < 1\)
Theoretically consistent (Theorem 2)
Scalable with a sparse precision matrix using implicit kernel operations
Polynomial spectral decay that emphasizes smooth, large-scale patterns while maintaining a heavy tail for mid/high frequencies
Recommendation: Use CAR for all spatial pattern detection tasks.
Null distribution approximations#
Under the null hypothesis (spatial independence), \(Q_n\) follows a weighted chi-square:
where \(\lambda_i\) are eigenvalues of \(\mathbf{K}\) and \(m = \text{rank}(\mathbf{K})\).
Three approximation methods are provided, balancing accuracy and speed:
Method |
Complexity |
Applicability |
Use case |
CLT |
O(N) |
Works for all kernels |
Large N, indefinite kernels |
Welch |
O(N) |
Positive semi-definite kernels only |
Default, large N |
Liu |
O(N³) |
Positive semi-definite kernels only |
N ≤ 5000 or FFT grids |
CLT: Approximates \(Q_n\) as normal with mean \(\mu = \text{tr}(\mathbf{K})\) and variance \(\sigma^2 = 2\text{tr}(\mathbf{K}^2)\).
Welch/Satterthwaite: Matches first two moments to a scaled chi-square distribution using Hutchinson trace estimation. Recommended default.
Liu: Exact eigendecomposition followed by polynomial moment-matching. Most accurate but requires O(N³) computation.
R-test: bivariate spatial co-expression#
Extends Q-tests to test spatial correlation between two features:
where \(\mathbf{x}, \mathbf{y}\) are standardized features.
Null distribution: \(R_{xy} \sim \mathcal{N}(0, \sigma^2)\) with \(\sigma^2 = \text{tr}(\mathbf{K}^2)\) under spatial independence.
Typical workflow:
Identify spatially variable genes (SVGs) via univariate Q-test
Test pairwise R-statistics among top SVGs
Control false discovery rate (FDR) across comparisons
FFT acceleration for regular grids#
For Block-Toeplitz kernels on regular grids (e.g., Visium HD, imaging), eigenvalues decouple via FFT:
where \(\mathbf{U}\) is the FFT basis and \(\Lambda\) is diagonal.
Complexity reduction:
Explicit eigendecomposition: O(N³)
FFT eigenvalues: O(N log N)
Q-test computation: O(N log N)
Example: 1000×1000 grid
Explicit kernel: ~10 hours
FFT kernel: ~1 minute
Supported topologies: Square (4-neighbor, default) and hexagonal (6-neighbor).
Practical summary#
Method |
Test consistency |
Use case |
|---|---|---|
Moran’s I |
✗ Spectral cancellation |
Autocorrelation |
Graph Lap. |
✓ Guaranteed |
High-frequency, local variation |
CAR |
✓ Guaranteed |
Low-frequency, smoothed patterns |
Best practice: Use CAR kernel for consistent, high-power detection across functional patterns. Use FFT-accelerated CAR on regular grids.
See also#
Quick Start — Practical usage examples
Kernel Design — Kernel selection and design
quadsv.statistics — Statistical API reference