Cross-sample Comparison#

Suppose you have two groups of spatial-omics samples (for example a set of healthy controls and a set of cancer sections) and want to ask which genes show the biggest spatial-pattern difference between the groups. The ComparatorIrregular and ComparatorGrid classes give you a frequency-domain pipeline that does this without spatial registration. The default null is an analytic Liu-approximation Wald test (null="wald"); a label-permutation null is available on the binary path with null="permutation". The GLM (multi-column / continuous design) path is Wald-only.

Note

This API is under active development. Signatures may shift between minor releases.

Why frequency domain?#

The 2-D power spectrum \(|\hat x(k)|^2\) of a rasterised gene image is translation-invariant: shifting the image leaves the spectrum unchanged. Radial averaging additionally makes the representation rotation-invariant. Together these mean samples never need to be spatially registered onto each other, which would otherwise be a hard requirement when (for example) healthy and cancer slides have no shared anatomy.

Five-step pipeline#

ComparatorIrregular chains five stages:

  1. Per-sample 2-D power spectra (compute_spectra).

  2. Reduction to a low-dimensional feature vector. The default is radial 1-D bins.

  3. Background normalisation that cancels per-slide differences in gain and sensitivity (normalize_background — geometric-mean spectrum across all genes per sample).

  4. Optional residualisation against covariate spectra (normalize_covariates: cell-type maps, tissue-domain indicators, …).

  5. Per-gene two-group / GLM-contrast test (test_diff_freq(design, ...)) with BH-FDR correction. The design argument is supplied at test time, not at construction, so a single fitted comparator can serve any number of unrelated comparisons on the same spectra.

DC vs AC: separating expression level from pattern shape

The pipeline always mean-centres each gene’s spatial signal before the FFT, splitting the information cleanly into two orthogonal pieces.

The DC scalar is the per-sample grid mean, i.e. total normalised expression. It is tested across groups with test_diff_expr(), which runs an analytic Welch-Satterthwaite t-test by default (null="wald"; null="permutation" is also available) with BH-FDR. This is a spatially-aware differential-expression test.

The AC spectrum is the pattern shape, with DC exactly zero. It is tested with test_diff_freq() using one of the two statistics listed below.

The two tests carry complementary information. A gene may be “only DE” (same pattern, different magnitude), “only pattern” (same total expression, different spatial layout), or both. Run them side by side and inspect where the hits overlap and where they separate.

The two pattern-test statistics ship out of the box and share a common dispatch, so they are directly comparable:

Statistic

What it measures

log_l2 (default)

Quadratic form = D'WD on log-spectra differences. Supports analytic null="wald" (Liu mixture-χ² tail; the default on test_diff_freq()) and null="permutation" on the binary path. The Wald null bypasses the BH-FDR floor that the exact permutation test hits at small per-arm n, and is the only path that works on multi-column / continuous designs.

welch_t_cauchy

Cauchy combination of per-bin Welch t-statistics. Analytic null is built in; remains well-calibrated at very small n. Binary path only.

Both run through quadsv.comparators.multisample.compare_two_groups() (or quadsv.ComparatorIrregular.test_diff_freq() for the class API); flip statistic="log_l2"statistic="welch_t_cauchy" to compare on the same fitted spectra.

Picking a class#

Two backends, mirroring the detector layer:

  • ComparatorIrregular takes a list of anndata.AnnData (irregular spots, common across Visium, Slide-seq, Stereo-seq, MERFISH). Spectra are computed with a batched type-1 NUFFT. Each sample keeps its own grid shape and spacing. Cross-sample comparability comes from radial binning in physical-frequency space.

  • ComparatorGrid takes a list of spatialdata.SpatialData (regular rasterised bins, e.g. Visium HD). Spectra are computed with a single batched 2-D FFT per sample.

Sparse adata.X and layer matrices are not densified up front. The spectrum loop converts exactly one gene column at a time. The Comparator() factory dispatches between the two classes based on the input list type. Mixed lists raise TypeError.

Toy walkthrough (AnnData / NUFFT)#

Eight synthetic samples (4 per group) of 10 genes. Gene g0 carries a low-frequency stripe pattern in group 1 only.

import anndata as ad
import numpy as np
from quadsv import ComparatorIrregular

rng = np.random.default_rng(3)
ny = nx = 32
gene_names = [f"g{i}" for i in range(10)]

def make_adata(group: int) -> ad.AnnData:
    yy, xx = np.meshgrid(np.arange(ny), np.arange(nx), indexing="ij")
    coords = np.stack([yy.ravel(), xx.ravel()], axis=1).astype(float)
    X = rng.standard_normal((ny * nx, len(gene_names))) * 0.1
    if group == 1:
        X[:, 0] += 1.5 * np.sin(2 * np.pi * yy / 16.0).ravel()
    a = ad.AnnData(X=X)
    a.var_names = gene_names
    a.obsm["spatial"] = coords
    return a

samples = [make_adata(0) for _ in range(4)] + [make_adata(1) for _ in range(4)]
design = np.array([0, 0, 0, 0, 1, 1, 1, 1])  # 1-D labels → binary contrast

cmp = (
    ComparatorIrregular(samples, gene_names)
    .compute_spectra()
    .normalize_background()
)
# Default null="wald" → analytic Liu-approximation Wald test.
# Add null="permutation", n_perm=300, random_state=0 for the
# label-permutation alternative.
results = cmp.test_diff_freq(design, statistic="log_l2")
print(results.head())

The implanted gene g0 ranks first in the resulting table.

Walkthrough (SpatialData / FFT)#

For rasterised-grid samples, swap in ComparatorGrid and pass the same bin / table / coord keys you would pass to DetectorGrid:

import spatialdata as sd
from quadsv import ComparatorGrid

samples_sd = [sd.read_zarr(p) for p in paths_by_group]
design = np.array([0] * len(paths_a) + [1] * len(paths_b))
cmp = ComparatorGrid(
    samples_sd,
    bins="bin_shapes",            # SpatialElement name shared by every sdata
    table_name="counts",          # table inside each sdata
    col_key="array_col",          # obs column with bin-column indices
    row_key="array_row",          # obs column with bin-row indices
    value_key=None,               # None means rasterise expression off .X
    fft_chunk_size=256,           # genes per batched scipy.fft call
).compute_spectra().normalize_background()
cmp.test_diff_freq(design, statistic="log_l2")

Mixed coordinate units (NUFFT path)#

When samples ship coordinates in different physical units

Some pipelines store coordinates in mixed units. For example one slide may be in μm and another in Visium full-resolution pixels at 0.35 μm/pixel. Pass unit_scales to convert each sample’s raw coords into the common unit. Radial bins then come out in cycles per that unit on every sample:

cmp = ComparatorIrregular(
    samples,
    gene_names=gene_names,
    unit_scales=[1.0, 0.35, 1.0, 0.35],
    spacing=(50.0, 50.0),       # common physical spacing, μm
    n_radial_bins=30,
).compute_spectra().normalize_background()
cmp.test_diff_freq(design, statistic="log_l2")

If grid_shape and spacing are left unset, each sample’s k-grid is auto-inferred from its coords via quadsv.kernels.nufft._infer_grid_from_coords(). quadsv.kernels.nufft.power_spectrum_2d_nufft() is the lower-level primitive that runs one sample at a time.

Visium hex grids#

For 10x Visium slides, quadsv.utils.load_visium_sample() reads a Space Ranger output directory into an anndata.AnnData. You can feed that AnnData directly to ComparatorIrregular. The NUFFT backend handles the hex layout natively, no manual rasterisation needed. If you do want the explicit hex-to-grid rasterisation, quadsv.utils.visium_to_grid() returns a (n_genes, 78, 128) array and the physical spacing (dy, dx) = (100·√3/2, 50) μm per cell for v1 Visium. The smallest resolvable pattern is roughly 2 · 86.6 μm 173 μm along the coarser axis (the Nyquist limit).

Choosing covariate maps for residualisation#

normalize_covariates() takes one of two shapes — a sequence of column-name strings shared across samples, or a sequence of per-sample image arrays:

# 1. Shared column names — interpreted by the subclass.
#    ComparatorIrregular: each key is looked up in adata.obs.columns
#    first, then adata.var_names — so the same call accepts
#    deconvolved cell-type proportion columns *and* per-gene
#    expression columns (housekeeping / marker genes) interchangeably.
cmp.normalize_covariates(["celltype_astro", "celltype_neuron", "MALAT1"])

# ComparatorGrid: forward as `value_key=` to spatialdata.rasterize_bins
# (works for .obs columns AND var_names in the comparator's table).
cmp_g.normalize_covariates(["region_label", "MALAT1"])

# 2. Pre-rasterized per-sample arrays of shape (n_covariates, ny_s, nx_s).
#    Universal: works on either subclass; use when covariates aren't
#    already attached to the sample containers.
cmp.normalize_covariates(per_sample_arrays)

Useful covariate candidates:

  • Cell-type proportion maps from a deconvolution tool such as Cell2location, CARD, or RCTD. One channel per cell type.

  • Tissue-domain indicator maps from a spatial clustering method such as BayesSpace or GraphST.

  • A composite “housekeeping” expression image to absorb depth gradients.

Residualisation is applied after background normalisation and before testing.

See also#