Scalable Computation#

Evaluating \(Q_n = \mathbf{z}^\top \tilde{\mathbf{K}} \mathbf{z}\) and its p-value naively requires the dense \(n \times n\) matrix \(\tilde{\mathbf{K}}\) and its full spectrum, which costs \(\mathcal{O}(n^3)\) time and \(\mathcal{O}(n^2)\) memory. Modern spatial-omics datasets (with \(n\) from \(10^3\) to \(10^7\) spots and \(m\) from \(10^3\) to \(10^4\) features) make that intractable.

quadsv replaces the dense baseline with scalable operators that never materialise \(\tilde{\mathbf{K}}\) and never run a full eigendecomposition. This page summarises the three orthogonal axes of the design: null-distribution approximations, structured or sparse kernel operators, and Fourier-accelerated primitives. It also tabulates their combined complexity.

Notation: \(c_p := \operatorname{tr}(\tilde{\mathbf{K}}^p)\) is the \(p\)-th spectral power sum. Under the null, \(\mathbb{E}[Q_n] = c_1\) and \(\operatorname{Var}[Q_n] = 2 c_2 + \epsilon_{\text{kurtosis}}\).

Null-distribution approximations#

Under \(H_0\), \(Q_n \xrightarrow{d} \sum_{i=1}^n \lambda_i^{(n)} \chi_1^2\). An exact p-value can be obtained by Davies’ inversion of

\[\varphi_{Q_n}(t) \;=\; \prod_{i=1}^n \bigl(1 - 2it\lambda_i^{(n)}\bigr)^{-1/2},\]

but only after paying \(\mathcal{O}(n^3)\) for the full spectrum plus \(\mathcal{O}(Kn)\) per feature, where \(K\) is the number of quadrature nodes. To avoid the eigendecomposition entirely, quadsv ships two moment-matching fits that evaluate in \(\mathcal{O}(1)\) per feature once the cumulants are cached:

  • Welch-Satterthwaite and CLT match \(c_1, c_2\) to a scaled central \(\chi^2\) (Welch, PSD kernels only) or to a Normal (CLT, valid for indefinite \(\tilde{\mathbf{K}}\) too). Both need only \(\operatorname{tr}(\tilde{\mathbf{K}})\) and \(\operatorname{tr}(\tilde{\mathbf{K}}^2)\).

  • Liu’s approximation matches the first four cumulants \(c_1 \ldots c_4\) to a shifted non-central \(\chi^2\). The higher cumulants are estimated via Hutchinson’s trace estimator,

    \[c_p \;=\; \mathbb{E}\bigl[\mathbf{v}^\top \tilde{\mathbf{K}}^p \mathbf{v}\bigr] \;\approx\; \tfrac{1}{V}\,\textstyle\sum_{s=1}^{V} \mathbf{v}_s^\top \tilde{\mathbf{K}}^p \mathbf{v}_s,\]

    with iid Rademacher probes \(\{\mathbf{v}_s\} \stackrel{\text{iid}}{\sim} \{\pm 1\}^n\). Two matvecs per probe deliver all four cumulants from the same probe pool.

Finite-:math:`n` Dirichlet correction. Because \(\mathbf{z} = \mathbf{H}\mathbf{x} / \hat{\sigma}\) uses the sample variance \(\hat{\sigma}^2 = \mathbf{x}^\top \mathbf{H} \mathbf{x} / (n-1)\) in the denominator, and that sample variance is correlated with the numerator, the exact null variance follows a Dirichlet(1/2) ratio:

\[\operatorname{Var}[Q_n] \;=\; \frac{2\,[\,(n-1)\,c_2 - c_1^{2}\,]}{n+1} \;<\; 2 c_2.\]

Welch and Liu both use this corrected variance. Without it, Liu’s right tail collapses to zero on broad-spectrum kernels where \(c_1^{2} \approx (n-1)\,c_2\). CAR on a dense grid is the prototypical failure mode.

Note

Moment-matching’s right tail decays faster than the true null on heavy-tailed spectra, which can over-reject in the extreme tails. Liu is the most tail-accurate of the three, Welch is tighter in the bulk, and CLT is the coarsest. When in doubt, run Liu.

Structured and sparse kernel operators#

For an arbitrary spatial coordinate cloud \(\{\mathbf{s}_i\}\), three strategies let matvecs scale sub-quadratically:

  • Sparse kernels (\(k\)-NN, thresholded distance). \(\mathbf{K}\) has \(\mathcal{O}(nk)\) non-zeros. Matvecs and Hutchinson cumulants cost \(\mathcal{O}(nk)\). Caveat: many ad-hoc sparse kernels (notably Moran’s I) are indefinite, which induces the cancellation pathology and costs power on composite spatial patterns.

  • Low-rank kernels \(\mathbf{K} \approx \mathbf{U}\mathbf{U}^\top\) with \(\mathbf{U} \in \mathbb{R}^{n \times r}\), \(r \ll n\). Matvecs cost \(\mathcal{O}(nr)\). Spectra fall out of an \(\mathcal{O}(r^3)\) eigen-problem on \(\mathbf{U}^\top\mathbf{U}\). Factorisation cost is \(\mathcal{O}(nr^2)\) via Nyström or Random Fourier Features. Caveat: finite rank creates an infinite-dimensional null space (blind spots), so power vanishes on high-frequency patterns.

  • Sparse precision (CAR). The precision matrix \(\mathbf{\Omega} = \mathbf{I} - \rho \widetilde{\mathbf{W}}\) is sparse with \(\mathcal{O}(nk)\) non-zeros even though \(\mathbf{K} = \mathbf{\Omega}^{-1}\) is dense. On typical 2-D graphs a sparse LU (or Cholesky) factorisation of \(\mathbf{\Omega}\) costs \(\mathcal{O}(n^{3/2})\) time with \(\mathcal{O}(n \log n)\) fill-in. Each subsequent \(\mathbf{K}\mathbf{v}\) is an \(\mathcal{O}(n \log n)\) triangular solve. Preserves full rank and positive definiteness.

Fourier-accelerated methods#

Under periodic boundary conditions, any translation-invariant kernel on a regular 2-D or 3-D grid is diagonalised by the Fourier basis:

\[\mathbf{K} \;=\; \mathbf{F}^{\mathsf H}\,\operatorname{diag}(\boldsymbol{\mu})\,\mathbf{F}.\]

The grid spectrum \(\boldsymbol{\mu}\) is analytic in the kernel hyper-parameters (bandwidth, \(\nu\), \(\rho\), neighbour structure), so no eigendecomposition is needed. Matvecs become element-wise products at the cost of one \(\mathcal{O}(n \log n)\) FFT:

\[\mathbf{K}\mathbf{v} \;=\; \tfrac{1}{n}\,\operatorname{vec}\bigl(\boldsymbol{\mu} \odot (\mathcal{F}\mathbf{v})\bigr).\]

Irregular coordinates (NUFFT). For non-uniform \(\{\mathbf{s}_i\}\), we approximate \(\mathbf{K}\) with an oversampled \(n'\)-grid (\(n' > n\)):

\[\mathbf{K} \;\approx\; \tfrac{1}{n'}\, \mathbf{U}\,\operatorname{diag}(\boldsymbol{\mu})\,\mathbf{U}^{\mathsf H}, \qquad \mathbf{U} \in \mathbb{C}^{n \times n'}.\]

Matvecs are a type-1 / type-2 NUFFT round-trip at cost \(\mathcal{O}(n' \log n')\):

\[\mathbf{K}\mathbf{v} \;=\; \frac{1}{n'}\, \overbrace{\mathbf{U}\bigl(\boldsymbol{\mu} \odot \underbrace{\mathbf{U}^{\mathsf H}\mathbf{v}}_{\text{type-1}}\bigr)}^{\text{type-2}}.\]

Because \(\mathbf{U}\) is non-unitary, eigenvalues of \(\mathbf{K}\) do not strictly follow \(\boldsymbol{\mu}\). The first two cumulants still have a closed form via Toeplitz FFT convolutions. With \(\mathbf{G} := \mathbf{U}^{\mathsf H}\mathbf{U}\), \(G_{k,k'} = n\,\phi(k'-k)\), and \(\phi(\boldsymbol{\Delta}) = \tfrac{1}{n}\sum_i e^{\mathrm{i}\boldsymbol{\Delta}\cdot\mathbf{s}_i}\):

\[\begin{split}c_1 &= \frac{n}{n'} \sum_{k} \mu_k \quad \text{(exact, since } \operatorname{diag}(\mathbf{G}) = n\mathbf{I}\text{),} \\ c_2 &= \frac{n^2}{(n')^2}\, \boldsymbol{\mu}^{\mathsf{H}}\, \boldsymbol{\Psi}\, \boldsymbol{\mu}, \qquad \Psi_{k,k'} = |\phi(k'-k)|^{2}.\end{split}\]

\(c_2\) is evaluated by two 2-D FFTs (one on \(|\phi|^2\), one on \(\boldsymbol{\mu}\)) at \(\mathcal{O}(n' \log n')\).

Note

Graph kernels under NUFFT define a different test. NUFFT implicitly builds a grid-stencil kernel on the auxiliary grid, which is topologically different from an exact \(k\)-NN graph on \(\{\mathbf{s}_i\}\) with open boundaries. So \(\mathbf{K}^{\text{NUFFT}}\) is not an approximation of \(\mathbf{K}^{\text{sparse}}\), and their \(Q_n\) statistics are not expected to match numerically.

This does not invalidate the NUFFT Q-test. It is just a different Q-test, evaluated on the grid-stencil operator rather than on the \(k\)-NN graph. It is:

  • well-defined (the NUFFT operator is real, symmetric, and translation-invariant by construction);

  • consistent whenever its effective spectrum is non-negative (e.g. "car", "graph_laplacian", and "moran" under CLT through the null approximations above);

  • powerful, at \(\mathcal{O}((n + n') \log n')\) per feature with no \(k\)-NN construction step, so it scales to \(n\) where the sparse-graph path cannot.

Distance-based kernels (Gaussian, Matérn) are unaffected. Their NUFFT and Matrix forms converge to the same translation-invariant operator as the oversampling factor grows.

Complexity summary#

Computational complexity of \(Q_n\) scaling strategies.#

Backend

Precompute

Memory

Time per \(Q\)

Key limitation

Dense + Davies (baseline)

\(\mathcal{O}(n^3)\)

\(\mathcal{O}(n^2)\)

\(\mathcal{O}(n^2) + \mathcal{O}(Kn)\)

Intractable for large \(n\).

Dense + Liu / Welch / CLT [1]

\(\mathcal{O}(n^2)\)

\(\mathcal{O}(n^2)\)

\(\mathcal{O}(n^2)\)

Inaccurate right tail.

Sparse kernel (\(k\)-NN)

\(\mathcal{O}(nk)\)

\(\mathcal{O}(nk)\)

\(\mathcal{O}(nk)\)

Power loss (cancellation).

Low-rank kernel (rank-\(r\))

\(\mathcal{O}(nr + r^3)\) [2]

\(\mathcal{O}(nr)\)

\(\mathcal{O}(nr)\)

Power loss (blind spots).

Sparse precision (CAR)

\(\mathcal{O}(n^{3/2})\)

\(\mathcal{O}(n \log n)\)

\(\mathcal{O}(n \log n)\)

Triangular solves hard to parallelise.

FFT (full spectrum)

\(\mathcal{O}(n \log n)\)

\(\mathcal{O}(n)\)

\(\mathcal{O}(n \log n)\)

Requires periodic grid.

NUFFT

\(\mathcal{O}((n + n') \log n')\)

\(\mathcal{O}(n')\)

\(\mathcal{O}((n + n') \log n')\)

Mismatch with graph kernels [3].

The per-\(Q\) cost is dominated by the kernel matvec. The null evaluation on top is either Davies inversion (\(\mathcal{O}(Kn)\) per feature) or constant-time Liu / Welch once the cumulants are cached.

Default recommendation#

  • Continuous kernels on arbitrary coordinates (Gaussian, Matérn). Use NUFFTKernel with Liu. \(\mathcal{O}(n)\) memory, \(\mathcal{O}(n \log n)\) time per feature, no power loss.

  • Regular rasterised grids (Visium HD, imaging). Use FFTKernel with Liu. Same asymptotic complexity as NUFFT, smaller constant.

  • General graph kernels (phylogenetic trees, single-cell k-NN graphs). Use MatrixKernel with method="car" (sparse precision). \(\mathcal{O}(n \log n)\) per feature after a \(\mathcal{O}(n^{3/2})\) one-time factorisation.

  • Small \(n\) (under ~5 000). Any backend will do. The dense matrix path is the simplest.

See also#