Cross-sample Comparison#
Suppose you have two groups of spatial-omics samples (for example a
set of healthy controls and a set of cancer sections) and want to
ask which genes show the biggest spatial-pattern difference between
the groups. The ComparatorIrregular and
ComparatorGrid classes give you a frequency-domain
pipeline that does this without spatial registration. The default
null is an analytic Liu-approximation Wald test (null="wald");
a label-permutation null is available on the binary path with
null="permutation". The GLM (multi-column / continuous design)
path is Wald-only.
Note
This API is under active development. Signatures may shift between minor releases.
Why frequency domain?#
The 2-D power spectrum \(|\hat x(k)|^2\) of a rasterised gene image is translation-invariant: shifting the image leaves the spectrum unchanged. Radial averaging additionally makes the representation rotation-invariant. Together these mean samples never need to be spatially registered onto each other, which would otherwise be a hard requirement when (for example) healthy and cancer slides have no shared anatomy.
Five-step pipeline#
ComparatorIrregular chains five stages:
Per-sample 2-D power spectra (
compute_spectra).Reduction to a low-dimensional feature vector. The default is radial 1-D bins.
Background normalisation that cancels per-slide differences in gain and sensitivity (
normalize_background— geometric-mean spectrum across all genes per sample).Optional residualisation against covariate spectra (
normalize_covariates: cell-type maps, tissue-domain indicators, …).Per-gene two-group / GLM-contrast test (
test_diff_freq(design, ...)) with BH-FDR correction. Thedesignargument is supplied at test time, not at construction, so a single fitted comparator can serve any number of unrelated comparisons on the same spectra.
DC vs AC: separating expression level from pattern shape
The pipeline always mean-centres each gene’s spatial signal before the FFT, splitting the information cleanly into two orthogonal pieces.
The DC scalar is the per-sample grid mean, i.e. total
normalised expression. It is tested across groups with
test_diff_expr(), which runs
an analytic Welch-Satterthwaite t-test by default
(null="wald"; null="permutation" is also available) with
BH-FDR. This is a spatially-aware differential-expression test.
The AC spectrum is the pattern shape, with DC exactly zero.
It is tested with
test_diff_freq() using one of
the two statistics listed below.
The two tests carry complementary information. A gene may be “only DE” (same pattern, different magnitude), “only pattern” (same total expression, different spatial layout), or both. Run them side by side and inspect where the hits overlap and where they separate.
The two pattern-test statistics ship out of the box and share a common dispatch, so they are directly comparable:
Statistic |
What it measures |
|---|---|
|
Quadratic form |
|
Cauchy combination of per-bin Welch t-statistics. Analytic null is built in; remains well-calibrated at very small n. Binary path only. |
Both run through quadsv.comparators.multisample.compare_two_groups()
(or quadsv.ComparatorIrregular.test_diff_freq() for the class API);
flip statistic="log_l2" ↔ statistic="welch_t_cauchy" to compare on
the same fitted spectra.
Picking a class#
Two backends, mirroring the detector layer:
ComparatorIrregulartakes a list ofanndata.AnnData(irregular spots, common across Visium, Slide-seq, Stereo-seq, MERFISH). Spectra are computed with a batched type-1 NUFFT. Each sample keeps its own grid shape and spacing. Cross-sample comparability comes from radial binning in physical-frequency space.ComparatorGridtakes a list ofspatialdata.SpatialData(regular rasterised bins, e.g. Visium HD). Spectra are computed with a single batched 2-D FFT per sample.
Sparse adata.X and layer matrices are not densified up front.
The spectrum loop converts exactly one gene column at a time. The
Comparator() factory dispatches between the two
classes based on the input list type. Mixed lists raise
TypeError.
Toy walkthrough (AnnData / NUFFT)#
Eight synthetic samples (4 per group) of 10 genes. Gene g0
carries a low-frequency stripe pattern in group 1 only.
import anndata as ad
import numpy as np
from quadsv import ComparatorIrregular
rng = np.random.default_rng(3)
ny = nx = 32
gene_names = [f"g{i}" for i in range(10)]
def make_adata(group: int) -> ad.AnnData:
yy, xx = np.meshgrid(np.arange(ny), np.arange(nx), indexing="ij")
coords = np.stack([yy.ravel(), xx.ravel()], axis=1).astype(float)
X = rng.standard_normal((ny * nx, len(gene_names))) * 0.1
if group == 1:
X[:, 0] += 1.5 * np.sin(2 * np.pi * yy / 16.0).ravel()
a = ad.AnnData(X=X)
a.var_names = gene_names
a.obsm["spatial"] = coords
return a
samples = [make_adata(0) for _ in range(4)] + [make_adata(1) for _ in range(4)]
design = np.array([0, 0, 0, 0, 1, 1, 1, 1]) # 1-D labels → binary contrast
cmp = (
ComparatorIrregular(samples, gene_names)
.compute_spectra()
.normalize_background()
)
# Default null="wald" → analytic Liu-approximation Wald test.
# Add null="permutation", n_perm=300, random_state=0 for the
# label-permutation alternative.
results = cmp.test_diff_freq(design, statistic="log_l2")
print(results.head())
The implanted gene g0 ranks first in the resulting table.
Walkthrough (SpatialData / FFT)#
For rasterised-grid samples, swap in ComparatorGrid
and pass the same bin / table / coord keys you would pass to
DetectorGrid:
import spatialdata as sd
from quadsv import ComparatorGrid
samples_sd = [sd.read_zarr(p) for p in paths_by_group]
design = np.array([0] * len(paths_a) + [1] * len(paths_b))
cmp = ComparatorGrid(
samples_sd,
bins="bin_shapes", # SpatialElement name shared by every sdata
table_name="counts", # table inside each sdata
col_key="array_col", # obs column with bin-column indices
row_key="array_row", # obs column with bin-row indices
value_key=None, # None means rasterise expression off .X
fft_chunk_size=256, # genes per batched scipy.fft call
).compute_spectra().normalize_background()
cmp.test_diff_freq(design, statistic="log_l2")
Mixed coordinate units (NUFFT path)#
When samples ship coordinates in different physical units
Some pipelines store coordinates in mixed units. For example one
slide may be in μm and another in Visium full-resolution pixels
at 0.35 μm/pixel. Pass unit_scales to convert each sample’s
raw coords into the common unit. Radial bins then come out in
cycles per that unit on every sample:
cmp = ComparatorIrregular(
samples,
gene_names=gene_names,
unit_scales=[1.0, 0.35, 1.0, 0.35],
spacing=(50.0, 50.0), # common physical spacing, μm
n_radial_bins=30,
).compute_spectra().normalize_background()
cmp.test_diff_freq(design, statistic="log_l2")
If grid_shape and spacing are left unset, each sample’s
k-grid is auto-inferred from its coords via
quadsv.kernels.nufft._infer_grid_from_coords().
quadsv.kernels.nufft.power_spectrum_2d_nufft() is the
lower-level primitive that runs one sample at a time.
Visium hex grids#
For 10x Visium slides,
quadsv.utils.load_visium_sample() reads a Space Ranger output
directory into an anndata.AnnData. You can feed that
AnnData directly to
ComparatorIrregular. The NUFFT backend handles the
hex layout natively, no manual rasterisation needed. If you do want
the explicit hex-to-grid rasterisation,
quadsv.utils.visium_to_grid() returns a (n_genes, 78, 128)
array and the physical spacing (dy, dx) = (100·√3/2, 50) μm per
cell for v1 Visium. The smallest resolvable pattern is roughly
2 · 86.6 μm ≈ 173 μm along the coarser axis (the Nyquist
limit).
Choosing covariate maps for residualisation#
normalize_covariates() takes one
of two shapes — a sequence of column-name strings shared across
samples, or a sequence of per-sample image arrays:
# 1. Shared column names — interpreted by the subclass.
# ComparatorIrregular: each key is looked up in adata.obs.columns
# first, then adata.var_names — so the same call accepts
# deconvolved cell-type proportion columns *and* per-gene
# expression columns (housekeeping / marker genes) interchangeably.
cmp.normalize_covariates(["celltype_astro", "celltype_neuron", "MALAT1"])
# ComparatorGrid: forward as `value_key=` to spatialdata.rasterize_bins
# (works for .obs columns AND var_names in the comparator's table).
cmp_g.normalize_covariates(["region_label", "MALAT1"])
# 2. Pre-rasterized per-sample arrays of shape (n_covariates, ny_s, nx_s).
# Universal: works on either subclass; use when covariates aren't
# already attached to the sample containers.
cmp.normalize_covariates(per_sample_arrays)
Useful covariate candidates:
Cell-type proportion maps from a deconvolution tool such as Cell2location, CARD, or RCTD. One channel per cell type.
Tissue-domain indicator maps from a spatial clustering method such as BayesSpace or GraphST.
A composite “housekeeping” expression image to absorb depth gradients.
Residualisation is applied after background normalisation and before testing.
See also#
Quick Start for the single-sample workflow.
Scalable Computation for how the FFT and NUFFT routines scale.
quadsv.ComparatorIrregularandquadsv.ComparatorGridfor the class reference.quadsv.comparators.multisample.compare_two_groups(),quadsv.comparators.multisample.compare_two_groups_masked(), andquadsv.comparators.multisample.compare_glm()for the array-level primitives.quadsv.kernels.fft.power_spectrum_2d()andquadsv.kernels.nufft.power_spectrum_2d_nufft()for the spectrum primitives.