Cross-sample Comparison

Cross-sample Comparison#

Suppose you have two groups of spatial-omics samples (for example a set of healthy controls and a set of cancer sections) and want to ask which genes show the biggest spatial-pattern difference between the groups. The ComparatorIrregular and ComparatorGrid classes give you a frequency-domain pipeline that does this without spatial registration. The default null is an analytic Liu-approximation Wald test (null="wald"); a label-permutation null is available on the binary path with null="permutation". The GLM (multi-column / continuous design) path is Wald-only.

Note

This API is under active development. Signatures may shift between minor releases.

Why frequency domain?#

The 2-D power spectrum \(|\hat x(k)|^2\) of a rasterised gene image is translation-invariant: shifting the image leaves the spectrum unchanged. Radial averaging additionally makes the representation rotation-invariant. Together these mean samples never need to be spatially registered onto each other, which would otherwise be a hard requirement when (for example) healthy and cancer slides have no shared anatomy.

Five-step pipeline#

ComparatorIrregular chains five stages:

Per-sample 2-D power spectra (compute_spectra).
Reduction to a low-dimensional feature vector. The default is radial 1-D bins.
Background normalisation that cancels per-slide differences in gain and sensitivity (normalize_background — geometric-mean spectrum across all genes per sample).
Optional residualisation against covariate spectra (normalize_covariates: cell-type maps, tissue-domain indicators, …).
Per-gene two-group / GLM-contrast test (test_diff_freq(design, ...)) with BH-FDR correction. The design argument is supplied at test time, not at construction, so a single fitted comparator can serve any number of unrelated comparisons on the same spectra.

The two pattern-test statistics ship out of the box and share a common dispatch, so they are directly comparable:

Statistic	What it measures
`log_l2` (default)	Quadratic form `T² = D'WD` on log-spectra differences. Supports analytic `null="wald"` (Liu mixture-χ² tail; the default on `test_diff_freq()`) and `null="permutation"` on the binary path. The Wald null bypasses the BH-FDR floor that the exact permutation test hits at small per-arm n, and is the only path that works on multi-column / continuous designs.
`welch_t_cauchy`	Cauchy combination of per-bin Welch t-statistics. Analytic null is built in; remains well-calibrated at very small n. Binary path only.

Both run through quadsv.comparators.multisample.compare_two_groups() (or quadsv.ComparatorIrregular.test_diff_freq() for the class API); flip statistic="log_l2" ↔ statistic="welch_t_cauchy" to compare on the same fitted spectra.

Picking a class#

Two backends, mirroring the detector layer:

ComparatorIrregular takes a list of anndata.AnnData (irregular spots, common across Visium, Slide-seq, Stereo-seq, MERFISH). Spectra are computed with a batched type-1 NUFFT. Each sample keeps its own grid shape and spacing. Cross-sample comparability comes from radial binning in physical-frequency space.
ComparatorGrid takes a list of spatialdata.SpatialData (regular rasterised bins, e.g. Visium HD). Spectra are computed with a single batched 2-D FFT per sample.

Sparse adata.X and layer matrices are not densified up front. The spectrum loop converts exactly one gene column at a time. The Comparator() factory dispatches between the two classes based on the input list type. Mixed lists raise TypeError.

Toy walkthrough (AnnData / NUFFT)#

Eight synthetic samples (4 per group) of 10 genes. Gene g0 carries a low-frequency stripe pattern in group 1 only.

import anndata as ad
import numpy as np
from quadsv import ComparatorIrregular

rng = np.random.default_rng(3)
ny = nx = 32
gene_names = [f"g{i}" for i in range(10)]

def make_adata(group: int) -> ad.AnnData:
    yy, xx = np.meshgrid(np.arange(ny), np.arange(nx), indexing="ij")
    coords = np.stack([yy.ravel(), xx.ravel()], axis=1).astype(float)
    X = rng.standard_normal((ny * nx, len(gene_names))) * 0.1
    if group == 1:
        X[:, 0] += 1.5 * np.sin(2 * np.pi * yy / 16.0).ravel()
    a = ad.AnnData(X=X)
    a.var_names = gene_names
    a.obsm["spatial"] = coords
    return a

samples = [make_adata(0) for _ in range(4)] + [make_adata(1) for _ in range(4)]
design = np.array([0, 0, 0, 0, 1, 1, 1, 1])  # 1-D labels → binary contrast

cmp = (
    ComparatorIrregular(samples, gene_names)
    .compute_spectra()
    .normalize_background()
)
# Default null="wald" → analytic Liu-approximation Wald test.
# Add null="permutation", n_perm=300, random_state=0 for the
# label-permutation alternative.
results = cmp.test_diff_freq(design, statistic="log_l2")
print(results.head())

The implanted gene g0 ranks first in the resulting table.

Walkthrough (SpatialData / FFT)#

For rasterised-grid samples, swap in ComparatorGrid and pass the same bin / table / coord keys you would pass to DetectorGrid:

import spatialdata as sd
from quadsv import ComparatorGrid

samples_sd = [sd.read_zarr(p) for p in paths_by_group]
design = np.array([0] * len(paths_a) + [1] * len(paths_b))
cmp = ComparatorGrid(
    samples_sd,
    bins="bin_shapes",            # SpatialElement name shared by every sdata
    table_name="counts",          # table inside each sdata
    col_key="array_col",          # obs column with bin-column indices
    row_key="array_row",          # obs column with bin-row indices
    value_key=None,               # None means rasterise expression off .X
    fft_chunk_size=256,           # genes per batched scipy.fft call
).compute_spectra().normalize_background()
cmp.test_diff_freq(design, statistic="log_l2")

Mixed coordinate units (NUFFT path)#

Visium hex grids#

For 10x Visium slides, quadsv.utils.load_visium_sample() reads a Space Ranger output directory into an anndata.AnnData. You can feed that AnnData directly to ComparatorIrregular. The NUFFT backend handles the hex layout natively, no manual rasterisation needed. If you do want the explicit hex-to-grid rasterisation, quadsv.utils.visium_to_grid() returns a (n_genes, 78, 128) array and the physical spacing (dy, dx) = (100·√3/2, 50) μm per cell for v1 Visium. The smallest resolvable pattern is roughly 2 · 86.6 μm ≈ 173 μm along the coarser axis (the Nyquist limit).

Choosing covariate maps for residualisation#

normalize_covariates() takes one of two shapes — a sequence of column-name strings shared across samples, or a sequence of per-sample image arrays:

# 1. Shared column names — interpreted by the subclass.
#    ComparatorIrregular: each key is looked up in adata.obs.columns
#    first, then adata.var_names — so the same call accepts
#    deconvolved cell-type proportion columns *and* per-gene
#    expression columns (housekeeping / marker genes) interchangeably.
cmp.normalize_covariates(["celltype_astro", "celltype_neuron", "MALAT1"])

# ComparatorGrid: forward as `value_key=` to spatialdata.rasterize_bins
# (works for .obs columns AND var_names in the comparator's table).
cmp_g.normalize_covariates(["region_label", "MALAT1"])

# 2. Pre-rasterized per-sample arrays of shape (n_covariates, ny_s, nx_s).
#    Universal: works on either subclass; use when covariates aren't
#    already attached to the sample containers.
cmp.normalize_covariates(per_sample_arrays)

Useful covariate candidates:

Cell-type proportion maps from a deconvolution tool such as Cell2location, CARD, or RCTD. One channel per cell type.
Tissue-domain indicator maps from a spatial clustering method such as BayesSpace or GraphST.
A composite “housekeeping” expression image to absorb depth gradients.

Residualisation is applied after background normalisation and before testing.