Scalable Computation

Scalable Computation#

Evaluating \(Q_n = \mathbf{z}^\top \tilde{\mathbf{K}} \mathbf{z}\) and its p-value naively requires the dense \(n \times n\) matrix \(\tilde{\mathbf{K}}\) and its full spectrum, which costs \(\mathcal{O}(n^3)\) time and \(\mathcal{O}(n^2)\) memory. Modern spatial-omics datasets (with \(n\) from \(10^3\) to \(10^7\) spots and \(m\) from \(10^3\) to \(10^4\) features) make that intractable.

quadsv replaces the dense baseline with scalable operators that never materialise \(\tilde{\mathbf{K}}\) and never run a full eigendecomposition. This page summarises the three orthogonal axes of the design: null-distribution approximations, structured or sparse kernel operators, and Fourier-accelerated primitives. It also tabulates their combined complexity.

Notation: \(c_p := \operatorname{tr}(\tilde{\mathbf{K}}^p)\) is the \(p\)-th spectral power sum. Under the null, \(\mathbb{E}[Q_n] = c_1\) and \(\operatorname{Var}[Q_n] = 2 c_2 + \epsilon_{\text{kurtosis}}\).

Null-distribution approximations#

Under \(H_0\), \(Q_n \xrightarrow{d} \sum_{i=1}^n \lambda_i^{(n)} \chi_1^2\). An exact p-value can be obtained by Davies’ inversion of

\[\varphi_{Q_n}(t) \;=\; \prod_{i=1}^n \bigl(1 - 2it\lambda_i^{(n)}\bigr)^{-1/2},\]

but only after paying \(\mathcal{O}(n^3)\) for the full spectrum plus \(\mathcal{O}(Kn)\) per feature, where \(K\) is the number of quadrature nodes. To avoid the eigendecomposition entirely, quadsv ships two moment-matching fits that evaluate in \(\mathcal{O}(1)\) per feature once the cumulants are cached:

Welch-Satterthwaite and CLT match \(c_1, c_2\) to a scaled central \(\chi^2\) (Welch, PSD kernels only) or to a Normal (CLT, valid for indefinite \(\tilde{\mathbf{K}}\) too). Both need only \(\operatorname{tr}(\tilde{\mathbf{K}})\) and \(\operatorname{tr}(\tilde{\mathbf{K}}^2)\).
Liu’s approximation matches the first four cumulants \(c_1 \ldots c_4\) to a shifted non-central \(\chi^2\). The higher cumulants are estimated via Hutchinson’s trace estimator,

\[c_p \;=\; \mathbb{E}\bigl[\mathbf{v}^\top \tilde{\mathbf{K}}^p \mathbf{v}\bigr] \;\approx\; \tfrac{1}{V}\,\textstyle\sum_{s=1}^{V} \mathbf{v}_s^\top \tilde{\mathbf{K}}^p \mathbf{v}_s,\]

with iid Rademacher probes \(\{\mathbf{v}_s\} \stackrel{\text{iid}}{\sim} \{\pm 1\}^n\). Two matvecs per probe deliver all four cumulants from the same probe pool.

Finite-:math:`n` Dirichlet correction. Because \(\mathbf{z} = \mathbf{H}\mathbf{x} / \hat{\sigma}\) uses the sample variance \(\hat{\sigma}^2 = \mathbf{x}^\top \mathbf{H} \mathbf{x} / (n-1)\) in the denominator, and that sample variance is correlated with the numerator, the exact null variance follows a Dirichlet(1/2) ratio:

\[\operatorname{Var}[Q_n] \;=\; \frac{2\,[\,(n-1)\,c_2 - c_1^{2}\,]}{n+1} \;<\; 2 c_2.\]

Welch and Liu both use this corrected variance. Without it, Liu’s right tail collapses to zero on broad-spectrum kernels where \(c_1^{2} \approx (n-1)\,c_2\). CAR on a dense grid is the prototypical failure mode.

Note

Moment-matching’s right tail decays faster than the true null on heavy-tailed spectra, which can over-reject in the extreme tails. Liu is the most tail-accurate of the three, Welch is tighter in the bulk, and CLT is the coarsest. When in doubt, run Liu.

Structured and sparse kernel operators#

For an arbitrary spatial coordinate cloud \(\{\mathbf{s}_i\}\), three strategies let matvecs scale sub-quadratically:

Sparse kernels (\(k\)-NN, thresholded distance). \(\mathbf{K}\) has \(\mathcal{O}(nk)\) non-zeros. Matvecs and Hutchinson cumulants cost \(\mathcal{O}(nk)\). Caveat: many ad-hoc sparse kernels (notably Moran’s I) are indefinite, which induces the cancellation pathology and costs power on composite spatial patterns.
Low-rank kernels \(\mathbf{K} \approx \mathbf{U}\mathbf{U}^\top\) with \(\mathbf{U} \in \mathbb{R}^{n \times r}\), \(r \ll n\). Matvecs cost \(\mathcal{O}(nr)\). Spectra fall out of an \(\mathcal{O}(r^3)\) eigen-problem on \(\mathbf{U}^\top\mathbf{U}\). Factorisation cost is \(\mathcal{O}(nr^2)\) via Nyström or Random Fourier Features. Caveat: finite rank creates an infinite-dimensional null space (blind spots), so power vanishes on high-frequency patterns.
Sparse precision (CAR). The precision matrix \(\mathbf{\Omega} = \mathbf{I} - \rho \widetilde{\mathbf{W}}\) is sparse with \(\mathcal{O}(nk)\) non-zeros even though \(\mathbf{K} = \mathbf{\Omega}^{-1}\) is dense. On typical 2-D graphs a sparse LU (or Cholesky) factorisation of \(\mathbf{\Omega}\) costs \(\mathcal{O}(n^{3/2})\) time with \(\mathcal{O}(n \log n)\) fill-in. Each subsequent \(\mathbf{K}\mathbf{v}\) is an \(\mathcal{O}(n \log n)\) triangular solve. Preserves full rank and positive definiteness.

Fourier-accelerated methods#

Under periodic boundary conditions, any translation-invariant kernel on a regular 2-D or 3-D grid is diagonalised by the Fourier basis:

\[\mathbf{K} \;=\; \mathbf{F}^{\mathsf H}\,\operatorname{diag}(\boldsymbol{\mu})\,\mathbf{F}.\]

The grid spectrum \(\boldsymbol{\mu}\) is analytic in the kernel hyper-parameters (bandwidth, \(\nu\), \(\rho\), neighbour structure), so no eigendecomposition is needed. Matvecs become element-wise products at the cost of one \(\mathcal{O}(n \log n)\) FFT:

\[\mathbf{K}\mathbf{v} \;=\; \tfrac{1}{n}\,\operatorname{vec}\bigl(\boldsymbol{\mu} \odot (\mathcal{F}\mathbf{v})\bigr).\]

Irregular coordinates (NUFFT). For non-uniform \(\{\mathbf{s}_i\}\), we approximate \(\mathbf{K}\) with an oversampled \(n'\)-grid (\(n' > n\)):

\[\mathbf{K} \;\approx\; \tfrac{1}{n'}\, \mathbf{U}\,\operatorname{diag}(\boldsymbol{\mu})\,\mathbf{U}^{\mathsf H}, \qquad \mathbf{U} \in \mathbb{C}^{n \times n'}.\]

Matvecs are a type-1 / type-2 NUFFT round-trip at cost \(\mathcal{O}(n' \log n')\):

\[\mathbf{K}\mathbf{v} \;=\; \frac{1}{n'}\, \overbrace{\mathbf{U}\bigl(\boldsymbol{\mu} \odot \underbrace{\mathbf{U}^{\mathsf H}\mathbf{v}}_{\text{type-1}}\bigr)}^{\text{type-2}}.\]

Because \(\mathbf{U}\) is non-unitary, eigenvalues of \(\mathbf{K}\) do not strictly follow \(\boldsymbol{\mu}\). The first two cumulants still have a closed form via Toeplitz FFT convolutions. With \(\mathbf{G} := \mathbf{U}^{\mathsf H}\mathbf{U}\), \(G_{k,k'} = n\,\phi(k'-k)\), and \(\phi(\boldsymbol{\Delta}) = \tfrac{1}{n}\sum_i e^{\mathrm{i}\boldsymbol{\Delta}\cdot\mathbf{s}_i}\):

\[\begin{split}c_1 &= \frac{n}{n'} \sum_{k} \mu_k \quad \text{(exact, since } \operatorname{diag}(\mathbf{G}) = n\mathbf{I}\text{),} \\ c_2 &= \frac{n^2}{(n')^2}\, \boldsymbol{\mu}^{\mathsf{H}}\, \boldsymbol{\Psi}\, \boldsymbol{\mu}, \qquad \Psi_{k,k'} = |\phi(k'-k)|^{2}.\end{split}\]

\(c_2\) is evaluated by two 2-D FFTs (one on \(|\phi|^2\), one on \(\boldsymbol{\mu}\)) at \(\mathcal{O}(n' \log n')\).

Note

Graph kernels under NUFFT define a different test. NUFFT implicitly builds a grid-stencil kernel on the auxiliary grid, which is topologically different from an exact \(k\)-NN graph on \(\{\mathbf{s}_i\}\) with open boundaries. So \(\mathbf{K}^{\text{NUFFT}}\) is not an approximation of \(\mathbf{K}^{\text{sparse}}\), and their \(Q_n\) statistics are not expected to match numerically.

This does not invalidate the NUFFT Q-test. It is just a different Q-test, evaluated on the grid-stencil operator rather than on the \(k\)-NN graph. It is:

well-defined (the NUFFT operator is real, symmetric, and translation-invariant by construction);
consistent whenever its effective spectrum is non-negative (e.g. "car", "graph_laplacian", and "moran" under CLT through the null approximations above);
powerful, at \(\mathcal{O}((n + n') \log n')\) per feature with no \(k\)-NN construction step, so it scales to \(n\) where the sparse-graph path cannot.

Distance-based kernels (Gaussian, Matérn) are unaffected. Their NUFFT and Matrix forms converge to the same translation-invariant operator as the oversampling factor grows.

Complexity summary#

Computational complexity of \(Q_n\) scaling strategies.#
Backend	Precompute	Memory	Time per \(Q\)	Key limitation
Dense + Davies (baseline)	\(\mathcal{O}(n^3)\)	\(\mathcal{O}(n^2)\)	\(\mathcal{O}(n^2) + \mathcal{O}(Kn)\)	Intractable for large \(n\).
Dense + Liu / Welch / CLT [1]	\(\mathcal{O}(n^2)\)	\(\mathcal{O}(n^2)\)	\(\mathcal{O}(n^2)\)	Inaccurate right tail.
Sparse kernel (\(k\)-NN)	\(\mathcal{O}(nk)\)	\(\mathcal{O}(nk)\)	\(\mathcal{O}(nk)\)	Power loss (cancellation).
Low-rank kernel (rank-\(r\))	\(\mathcal{O}(nr + r^3)\) [2]	\(\mathcal{O}(nr)\)	\(\mathcal{O}(nr)\)	Power loss (blind spots).
Sparse precision (CAR)	\(\mathcal{O}(n^{3/2})\)	\(\mathcal{O}(n \log n)\)	\(\mathcal{O}(n \log n)\)	Triangular solves hard to parallelise.
FFT (full spectrum)	\(\mathcal{O}(n \log n)\)	\(\mathcal{O}(n)\)	\(\mathcal{O}(n \log n)\)	Requires periodic grid.
NUFFT	\(\mathcal{O}((n + n') \log n')\)	\(\mathcal{O}(n')\)	\(\mathcal{O}((n + n') \log n')\)	Mismatch with graph kernels [3].

The per-\(Q\) cost is dominated by the kernel matvec. The null evaluation on top is either Davies inversion (\(\mathcal{O}(Kn)\) per feature) or constant-time Liu / Welch once the cumulants are cached.

Default recommendation#

Continuous kernels on arbitrary coordinates (Gaussian, Matérn). Use NUFFTKernel with Liu. \(\mathcal{O}(n)\) memory, \(\mathcal{O}(n \log n)\) time per feature, no power loss.
Regular rasterised grids (Visium HD, imaging). Use FFTKernel with Liu. Same asymptotic complexity as NUFFT, smaller constant.
General graph kernels (phylogenetic trees, single-cell k-NN graphs). Use MatrixKernel with method="car" (sparse precision). \(\mathcal{O}(n \log n)\) per feature after a \(\mathcal{O}(n^{3/2})\) one-time factorisation.
Small \(n\) (under ~5 000). Any backend will do. The dense matrix path is the simplest.

Scalable Computation

Contents

Scalable Computation#

Null-distribution approximations#

Structured and sparse kernel operators#

Fourier-accelerated methods#

Complexity summary#

Default recommendation#

See also#