quadsv.comparators.multisample

quadsv.comparators.multisample#

Cross-sample spatial pattern comparison in the frequency domain.

This module implements an alignment-free, frequency-domain approach for ranking genes by spatial-pattern difference between two groups of spatial-omics samples (e.g., N healthy vs M cancer slides). The key primitive is the 2D power spectrum \(|\hat{x}(k)|^2\) of a rasterized gene image: power spectra are translation-invariant, so samples need not be spatially registered.

Pipeline#

Per-sample spectra — compute_sample_spectrum() runs quadsv.kernels.fft.power_spectrum_2d() on each sample’s (n_genes, ny, nx) array.
Radial binning (default, rotation-invariant) — radial_bin_spectrum() collapses the 2D spectrum onto a K-dim vector indexed by normalized radial frequency, harmonizing samples with different (ny, nx).
(Optional) 2D mode with rotation alignment — align_spectra_by_rotation() rotates each sample’s full 2D spectrum to maximize similarity to a reference, restoring comparability when directional anisotropy matters.
Batch correction — normalize_background() cancels per-slide gain/sensitivity differences; normalize_covariates() regresses out user-supplied covariate spectra (cell-type proportions, tissue domains, etc.); normalize_shape() projects per-(sample, gene) spectra onto the probability simplex along the frequency axis, isolating shape-only redistribution from amplitude differences.
Cross-sample comparison per gene — three dispatch targets share the same per-gene output schema:
- compare_two_groups() (binary 1-D labels) — permutation or analytic Wald (Liu mixture-χ²) null;
- compare_two_groups_masked() (binary + per-(sample, gene) presence mask) — same nulls, masked per-gene cohort;
- compare_glm() (general OLS design + contrast) — Wald null only.
compare_two_groups_scalar() runs the DC-component DE companion on per-(sample, gene) scalars (Welch t analytic or permutation).

This module only contains array-level primitives. The two high-level wrapper classes that drive the pipeline on anndata.AnnData / spatialdata.SpatialData containers live in quadsv.comparators (ComparatorIrregular / ComparatorGrid); their test_diff_freq(design, ...) method dispatches between the three comparison primitives above.

Notes

The default log-L2 statistic is a quadratic form on the log-radial spectrum: take per-group means in log-space, weight the difference by the bin weights W, and report T² = D' W D. At typical study sizes (3–10 slides per group) the exact-permutation test hits a BH-FDR floor; the analytic Wald null (Liu’s χ² mixture approximation against a pooled within-group Σ) bypasses that floor while remaining well-calibrated on real data — see scripts/comparator_benchmark for the calibration battery.

Functions#

`align_spectra_by_rotation`(landmark_spectra, grid_shapes, *)	Two-step rotation alignment: estimate per-sample rotations from
`apply_rotations_to_spectra`(spectra, grid_shapes, ...)	Rotate each sample's 2D power spectra by a per-sample angle.
`compare_glm`(spectra, design, contrast[, gene_names, ...])	Generalized two-group / continuous-covariate test via a design matrix.
`compare_two_groups`(spectra, groups[, gene_names, ...])	Test, for every gene, whether its spatial-pattern spectrum differs between two groups.
`compare_two_groups_masked`(spectra, groups, presence[, ...])	Per-gene two-group pattern test with incomplete data across samples.
`compare_two_groups_scalar`(values, groups[, ...])	Per-gene two-sample test on scalar per-sample values (classical DE).
`compute_sample_spectrum`(sample[, fft_solver, workers, ...])	Compute the 2D power spectrum of every gene in a single sample.
`estimate_rotations_from_landmarks`(landmark_spectra, ...)	Estimate the per-sample rotation that best aligns every landmark
`normalize_background`(spectra, *[, axis, eps])	Cancel per-sample multiplicative gain via cross-gene geometric-mean centring.
`normalize_covariates`(spectra, covariate_spectra, *[, ...])	Residualise log-spectra against the log of covariate spectra.
`normalize_shape`(spectra, *[, axis, eps])	Project each spectrum onto the probability simplex along `axis`.
`radial_bin_spectrum`(spectrum, grid_shape[, n_bins, ...])	Bin a 2D power spectrum into `n_bins` radial frequency bins.

Module Contents#

quadsv.comparators.multisample.align_spectra_by_rotation(landmark_spectra, grid_shapes, *, target_spectra=None, fft_solver='fft2', reference_index=0, n_theta=180, n_radius=64, progress=False)[source]#

Two-step rotation alignment: estimate per-sample rotations from landmark spectra (whose first dimension must match across samples), then apply those rotations to a separate set of target spectra (the full gene panel for each sample, typically a superset of the landmarks).

This is a convenience wrapper around estimate_rotations_from_landmarks() and apply_rotations_to_spectra(). Calling those directly is the right pattern when you want to inspect / cache the per-sample angles before applying them.

Implementation#

For every non-reference sample:

Expand each landmark’s 2D power spectrum to full-fft2 layout, fftshift so DC sits at the centre, and resample onto a polar (n_theta, n_radius) grid.
Compute per-landmark circular cross-correlation along the polar-angle axis against the reference sample’s same-index landmark. Every landmark contributes its own cross-correlation and the per-sample rotation is the angle that maximises the sum across landmarks (and across radii). Mean-template alignment — what the previous implementation did — was strictly weaker because the off-diagonal i ≠ j pair terms in corr(mean(a), mean(b)) are pure noise.
Rotate every entry of target_spectra[i] (if supplied) by the recovered angle.

param landmark_spectra:

Per-sample landmark spectra, shape (n_landmarks, ny, n_kx) per sample. n_landmarks must match across samples.

type landmark_spectra:

sequence of np.ndarray

param grid_shapes:

Per-sample (ny, nx) of the original rasterized image.

type grid_shapes:

sequence of tuple[int, int]

param target_spectra:

Per-sample spectra to which the recovered rotations are applied. Any first-axis dimension (e.g. full gene panel). If None, only the angles are returned.

type target_spectra:

sequence of np.ndarray, optional

param fft_solver:

FFT layout of both inputs. fft2 is recommended so the full angular content is present.

type fft_solver:

{‘fft2’, ‘rfft2’}, default ‘fft2’

param reference_index:

type reference_index:

int, default 0

param n_theta:

type n_theta:

int, default 180

param n_radius:

type n_radius:

int, default 64

param progress:

type progress:

bool, default False

returns:

rotated (list of np.ndarray or None) – Per-sample rotated target spectra (or None when target_spectra is omitted).
angles_deg (np.ndarray) – (n_samples,) recovered rotation angles in degrees. Reference angle is 0.

raises ValueError:

If reference_index is out of range, if landmark_spectra samples disagree on n_landmarks, or if target_spectra length does not match.

Parameters:

landmark_spectra (collections.abc.Sequence[ndarray])
grid_shapes (collections.abc.Sequence[tuple[int, int]])
target_spectra (collections.abc.Sequence[ndarray] | None)
fft_solver (str)
reference_index (int)
n_theta (int)
n_radius (int)
progress (bool)

Return type:

tuple[list[ndarray] | None, ndarray]

quadsv.comparators.multisample.apply_rotations_to_spectra(spectra, grid_shapes, angles_deg, *, fft_solver='fft2', progress=False)[source]#

Rotate each sample’s 2D power spectra by a per-sample angle.

Parameters:

spectra (sequence of np.ndarray) – Per-sample 2D power spectra — any first-axis dimension (e.g. full n_genes). Shape (n, ny, n_kx) with (ny, n_kx) matching fft_solver.
grid_shapes (sequence of tuple[int, int]) – Per-sample (ny, nx) of the original rasterized image.
angles_deg (np.ndarray) – Per-sample rotation angles in degrees (e.g. produced by estimate_rotations_from_landmarks()). Length must equal len(spectra).
fft_solver ({'fft2', 'rfft2'}, default 'fft2') – FFT layout of spectra.
progress (bool, default False) – Show a tqdm bar across samples.

Returns:

rotated – Per-sample rotated spectra with the same shape as the input.

Return type:

list of np.ndarray

Notes

Rotation is done on the 2D power spectrum (fftshifted so DC sits at the centre), not back on the spatial image. That is enough for any downstream analysis that operates on aligned spectra (radial or 2D-bin tests). Samples whose angle is exactly 0 are passed through as-is.

quadsv.comparators.multisample.compare_glm(spectra, design, contrast, gene_names=None, statistic='log_l2', null='wald', freq_weights=None, normalize_shape=False)[source]#

Generalized two-group / continuous-covariate test via a design matrix.

Generalises compare_two_groups() from binary group labels to an arbitrary GLM design matrix and a single-DOF linear contrast. The binary case is recovered exactly by passing design=pd.DataFrame({"group": groups}) and contrast="group"; p-values match compare_two_groups() to machine precision.

Parameters:

spectra (np.ndarray) – (n_samples, n_genes, K) spectral features (raw, not logged).
design (pd.DataFrame or np.ndarray) – Sample-level metadata. DataFrame columns are auto-encoded via patsy (treatment-coded categoricals + intercept); ndarray is passed through as the design matrix verbatim (caller responsible for the intercept column).
contrast (str, dict, or np.ndarray) –
Linear-contrast specification:
- str — name of a design column. Auto-resolves treatment-coded categoricals (e.g., "genotype" matches "genotype[T.TG]"). Multi-DOF (3+ level factor) contrasts must be passed as an explicit dict or ndarray.
- dict[str, float] — coefficient per design column.
- np.ndarray of length p — raw contrast vector.
gene_names (sequence of str, optional)
statistic ({'log_l2'}, default 'log_l2') – Currently only log_l2 is supported in the GLM path.
null ({'wald'}, default 'wald') – Only the analytic Wald-type null is supported here. Permutation nulls for continuous covariates are intentionally deferred (naive row permutation breaks the X-y joint distribution under nuisance covariates; correct alternatives like Freedman–Lane add complexity without a clear payoff over the analytic Wald). Pass null="permutation" only via compare_two_groups() (binary labels) for permutation-based tests.
normalize_shape (bool, default False) – If True, divide each per-(sample, gene) spectrum by its sum along the trailing (frequency) axis before the GLM is fit (same semantics as in compare_two_groups()). Use to isolate shape-only / frequency-redistribution signals along the design contrast — the test then only fires when the relative distribution of power across radial frequencies varies with the contrast, independent of overall amplitude.
freq_weights (ndarray | None)

Returns:

Columns Feature, Statistic, P_value, P_adj — same schema as compare_two_groups().

Return type:

pd.DataFrame

Raises:

ValueError – If shapes are inconsistent or contrast cannot be resolved.
NotImplementedError – If null='permutation' is requested.

quadsv.comparators.multisample.compare_two_groups(spectra, groups, gene_names=None, statistic='log_l2', null='wald', n_perm=1000, random_state=None, n_jobs=1, freq_weights=None, n_perm_max=10000, normalize_shape=False)[source]#

Test, for every gene, whether its spatial-pattern spectrum differs between two groups.

Parameters:

spectra (np.ndarray) – Per-sample spectral features of shape (n_samples, n_genes, K).
groups (np.ndarray) – Group labels of length n_samples taking exactly two distinct values (mapped internally to 0/1 in sorted order).
gene_names (sequence of str, optional) – Names for the gene axis. If None, integer indices are used.
statistic ({'log_l2', 'welch_t_cauchy'}, default 'log_l2') –
Test statistic:
- 'log_l2' — (optionally weighted) L2 distance between mean log-spectra. Global / summary statistic. Pair with null='wald' for an analytic mixture-χ² null that bypasses the small-n permutation BH-floor; null='permutation' (default) falls back to label permutations with exact enumeration when C(n, n_a) ≤ n_perm_max.
- 'welch_t_cauchy' — per-bin Welch two-sided t-test with analytic (t-distribution) p-values combined across bins via Cauchy combination. Analytic is the whole point: permutation p-values would floor at 1/(n_perm + 1) per bin, which would also floor the gene-level combined p-value and destroy BH-FDR power across thousands of genes. Yields an extra P_value_per_bin column.
null ({'wald', 'permutation'}, default 'wald') –
Null-distribution method. 'wald' (the default) uses an analytic Wald-type test for the L2 quadratic form: under H₀ the statistic T² = D'WD is distributed as a weighted sum of χ²₁ variables whose tail is integrated via Liu’s approximation (see quadsv.statistics.liu_sf()). 'permutation' uses the empirical sample-label permutation null and is the only option that respects the n_perm / random_state / n_perm_max arguments. Currently null='wald' is only supported for statistic='log_l2'; raises ValueError otherwise. welch_t_cauchy ignores this argument.

Sample-size guidance (residual df = n_a + n_b - 2):
- df ≥ 4 (n_a + n_b ≥ 6): 'wald' recommended — strong calibration + sensitivity; sweeps the top of our benchmark.
- df ≥ 3 (n_a + n_b ≥ 5): 'wald' acceptable.
- df < 3 (n_a + n_b ≤ 4): 'wald' emits a UserWarning; σ̂² has ≥ 67% relative noise so per-test calibration may be anti-conservative. Prefer statistic='welch_t_cauchy' (per-bin Welch t with proper df-corrected denominator) or stay with null='permutation' if the cohort allows enough exact relabellings.
n_perm (int, default 1000) – Number of label permutations for the null distribution. Ignored when statistic='welch_t_cauchy' or null='wald'.
random_state (int, optional) – Seed for the permutation RNG (ignored for 'welch_t_cauchy').
n_jobs (int, default 1) – Reserved for future parallelism over genes; currently unused (the per-stat implementations are already vectorized over genes).
freq_weights (np.ndarray, optional) – Only used by statistic='log_l2'. Non-negative weights of length K (the number of frequency bins); internally renormalized to sum-1. Lets the user emphasize specific frequencies — e.g., a polynomial low-pass shape to mirror a CAR kernel, or an exponential high-pass shape to mirror a Gaussian kernel. None (default) means uniform weights.
n_perm_max (int, default 10000) – If the total number of distinct two-group relabellings C(n_samples, n_a) is at most this, every possible relabelling is enumerated (exact permutation test) and n_perm is overridden to the enumeration count. This is both faster and strictly more accurate than sampling in the small-sample regime (e.g. 6-vs-6 → 924 partitions, 5-vs-5 → 252). Above the threshold the test falls back to n_perm random shuffles.
normalize_shape (bool, default False) – If True, divide each per-(sample, gene) spectrum by its sum along the trailing (frequency) axis before the statistic is computed (i.e., apply normalize_shape() to spectra first). Use to isolate shape-only / frequency-redistribution signals — the test then only fires when the relative distribution of power across radial frequencies varies with the contrast, independent of overall amplitude. Works with every valid statistic= value.

Returns:

Columns Feature, Statistic, P_value, P_adj (BH-FDR), sorted by descending statistic. When statistic='welch_t_cauchy', the frame also carries a P_value_per_bin object column — each entry is an (K,) numpy array of per-bin permutation p-values for that gene.

Return type:

pd.DataFrame

Raises:

ValueError – If statistic is unknown, groups does not contain exactly two values, or shapes are inconsistent.

quadsv.comparators.multisample.compare_two_groups_masked(spectra, groups, presence, gene_names=None, statistic='log_l2', null='wald', n_perm=1000, random_state=None, min_samples_per_group=2, freq_weights=None, n_perm_max=10000, normalize_shape=False)[source]#

Per-gene two-group pattern test with incomplete data across samples.

For each gene, only the subset of samples with presence[:, g] == True contributes to the observed statistic and to the label-permutation null. Genes that fail to reach min_samples_per_group observations in at least one group are reported with NaN p-values and the number of observed samples per group, so the user sees why they were skipped.

Parameters:

spectra (np.ndarray) – (n_samples, n_genes, K).
groups (np.ndarray) – (n_samples,), exactly two distinct labels.
presence (np.ndarray) – (n_samples, n_genes) boolean mask. True = gene is observed in that sample (contributes); False = gene is absent (ignored).
gene_names (sequence of str, optional)
statistic ({'log_l2', 'welch_t_cauchy'}, default 'log_l2')
null ({'wald', 'permutation'}, default 'wald') –
Null-distribution method. 'wald' (the default) uses an analytic Wald-type test adapted for the masked case via a mask-aware pooled-Σ estimator: a single global (K, K) Σ is accumulated across every gene’s present (sample, gene) cells (each gene contributes n_g - 2 residual degrees of freedom), and per-gene noncentrality scaling v_{c,g} = 1/n_a_g + 1/n_b_g adjusts the eigenvalues for that gene’s specific cohort. Cross-bin correlation structure is taken to be homogeneous across genes (the same A3 assumption used in compare_two_groups() with Wald). Empirical calibration on synthetic missingness up to 50% matches the unmasked Wald path. Currently supported only with statistic='log_l2'.

'permutation' runs a per-gene permutation test, exact-enumerated when C(n_g, n_a_g) ≤ n_perm_max (most genes at small samples).
n_perm (int, default 1000)
random_state (int, optional)
min_samples_per_group (int, default 2) – Minimum observed samples in each group for the gene to be tested.
freq_weights (np.ndarray, optional) – Only consumed by statistic='log_l2' (same semantics as compare_two_groups()).
normalize_shape (bool, default False) – If True, divide each per-(sample, gene) spectrum by its sum along the trailing (frequency) axis before the statistic is computed (same semantics as in compare_two_groups()). Use to isolate shape-only / frequency-redistribution signals. Works with every valid statistic= value.
n_perm_max (int)

Returns:

Columns Feature, Statistic, P_value, P_adj, n_obs_A, n_obs_B. For 'welch_t_cauchy' a P_value_per_bin column is also included (None for skipped genes). BH-FDR is computed only over tested genes.

Return type:

pd.DataFrame

quadsv.comparators.multisample.compare_two_groups_scalar(values, groups, gene_names=None, null='wald', n_perm=1000, random_state=None, n_perm_max=10000)[source]#

Per-gene two-sample test on scalar per-sample values (classical DE).

The natural companion to compare_two_groups(): tested on the DC scalars (per-gene grid means) produced by compute_sample_spectrum(), the result is statistically independent of the spectral pattern test because DC and AC are orthogonal by construction (the FFT pipeline always mean- centres each gene’s grid before computing power).

Two null distributions are supported, chosen via null:

null='wald' (default) — analytic two-sided Welch t-test p-value from the Welch-Satterthwaite t-distribution. No permutation BH-floor; the natural counterpart to compare_two_groups()’s null='wald' analytic path on the spectral side. The Welch t is itself a Wald-type statistic (point estimate / estimated SE under H₁), hence the shared kwarg name across the API surface.
null='permutation' — exact / approximate permutation null on abs(Welch t). More conservative at small n; produces identical p-values as a Mann-Whitney-style rank test up to ties when the permutation pool is exhausted.

Parameters:

values (np.ndarray) – Per-sample per-gene scalars of shape (n_samples, n_genes) — e.g., log-normalized mean expression on each slide.
groups (np.ndarray) – Group labels of length n_samples with exactly two distinct values.
gene_names (sequence of str, optional) – Gene names. Integer indices if None.
null ({'wald', 'permutation'}, default 'wald') – Null-distribution method. 'wald' returns analytic Welch-Satterthwaite t-distribution p-values; 'permutation' runs a label-shuffle null on abs(Welch t).
n_perm (int, default 1000) – Number of sample-label permutations for null='permutation'. Ignored when null='wald'.
random_state (int, optional) – Seed for the permutation RNG. Ignored when null='wald'.
n_perm_max (int, default 10000) – Cap on enumerated unique permutations.

Returns:

Columns Feature, Statistic (abs(Welch t)), Mean_diff (mean_groupA − mean_groupB), P_value, P_adj (BH-FDR), sorted by descending Statistic.

Return type:

pd.DataFrame

Raises:

ValueError – If shapes are inconsistent, groups does not contain exactly two distinct values, or null is unknown.

quadsv.comparators.multisample.compute_sample_spectrum(sample, fft_solver='rfft2', workers=None, return_dc=False)[source]#

Compute the 2D power spectrum of every gene in a single sample.

The spatial signal is mean-centred per gene before the FFT so that the resulting power spectrum carries only the AC component of the pattern — i.e. the k=0 (DC) bin is exactly zero and low-k leakage from per- sample mean shifts is eliminated. The separated DC scalars (the per-sample per-gene grid means) can be returned alongside the spectrum with return_dc=True and are the natural target for a classical differential expression test complementary to the spectral pattern test.

Parameters:

sample (np.ndarray) – Rasterized expression of shape (n_genes, ny, nx).
fft_solver ({'fft2', 'rfft2'}, default 'rfft2') – FFT routine forwarded to quadsv.kernels.fft.power_spectrum_2d().
workers (int, optional) – Parallel workers forwarded to scipy.fft.
return_dc (bool, default False) – If True, also return a (n_genes,) array of per-gene grid means (DC scalars of the uncentered signal).

Returns:

Power spectra of shape (n_genes, ny, n_kx). If return_dc=True, also returns a (n_genes,) DC array.

Return type:

np.ndarray or tuple[np.ndarray, np.ndarray]

Raises:

ValueError – If sample is not 3D.

quadsv.comparators.multisample.estimate_rotations_from_landmarks(landmark_spectra, grid_shapes, *, fft_solver='fft2', reference_index=0, n_theta=180, n_radius=64, progress=False)[source]#

Estimate the per-sample rotation that best aligns every landmark spectrum to the reference sample’s corresponding landmark.

For each non-reference sample the routine picks a single rotation angle that maximises the sum over landmarks of the per-landmark circular cross-correlation along the polar-angle axis — i.e. each landmark aligns to its same-index counterpart in the reference (not to a mean template). This is strictly stronger than mean-template alignment because it ignores cross-landmark noise (the off-diagonal i ≠ j terms that mean-of-means picks up) and picks up anisotropy shared across every landmark at a common orientation.

Parameters:

landmark_spectra (sequence of np.ndarray) – Per-sample landmark spectra. Shape (n_landmarks, ny, n_kx) with (ny, n_kx) following fft_solver. The first dimension (n_landmarks) must match across samples — landmark j in sample A is compared to landmark j in sample B.
grid_shapes (sequence of tuple[int, int]) – Per-sample (ny, nx) of the original rasterized image.
fft_solver ({'fft2', 'rfft2'}, default 'fft2') – FFT layout of landmark_spectra — rfft2 spectra are expanded to full 2D before resampling to preserve angular content.
reference_index (int, default 0) – Which sample’s landmarks act as the rotation reference (its angle is fixed at 0).
n_theta (int, default 180) – Angular resolution of the polar resampling. Recovered angles are accurate to 180 / n_theta degrees.
n_radius (int, default 64) – Radial resolution of the polar resampling.
progress (bool, default False) – If True, show a tqdm bar over non-reference samples.

Returns:

angles_deg – (n_samples,) recovered rotation angles in degrees. Reference angle is exactly 0.

Return type:

np.ndarray

Raises:

ValueError – If reference_index is out of range or any two samples have inconsistent n_landmarks.

quadsv.comparators.multisample.normalize_background(spectra, *, axis=-2, eps=1e-12)[source]#

Cancel per-sample multiplicative gain via cross-gene geometric-mean centring.

For each (sample, frequency-bin) pair, every gene’s power is divided by the geometric mean of the spectrum across the genes axis. Use this to correct per-sample multiplicative gain (sequencing depth, antibody titre, dewaxing efficiency) that scales every gene’s spectrum at every frequency by a sample-level factor.

Parameters:

spectra (np.ndarray) – Non-negative spectra \(P\) with shape (..., G, K) (\(G\) along axis, \(K\) frequency bins on the trailing axis). Any leading dimensions (e.g., n_samples) are broadcast over.
axis (int, default -2) – Axis along which the cross-gene geometric mean is taken (the genes axis).
eps (float, default 1e-12) – Floor \(\varepsilon\) added before the logarithm to keep zeros finite.

Returns:

Background-normalized spectra \(\tilde P\), same shape as spectra. Never mutates the input.

Return type:

np.ndarray

Notes

Let \(P\) denote the input spectrum, \(G\) the number of genes (length of axis), \(K\) the number of frequency bins, and \(\varepsilon\) the eps floor. The per-bin geometric-mean background is

\[b_{k} = \exp\!\Bigl( \tfrac{1}{G} \sum_{g'=1}^{G} \log\bigl(P_{g',k} + \varepsilon\bigr) \Bigr),\]

and the output is the per-bin gene-wise quotient

\[\tilde P_{g,k} = \frac{P_{g,k}}{b_{k}}.\]

Equivalently, in log-space this is per-bin mean centring across the genes axis,

\[\log \tilde P_{g,k} = \log\bigl(P_{g,k} + \varepsilon\bigr) - \tfrac{1}{G} \sum_{g'=1}^{G} \log\bigl(P_{g',k} + \varepsilon\bigr),\]

so after the transform \(\prod_{g} \tilde P_{g,k} = 1\) at every bin \(k\) — the cross-gene geometric mean at every frequency is unity.

The operation is equivalent to a per-bin OLS regression of \(\log P_{\cdot,k}\) against a constant (the cross-gene mean) followed by exponentiation. With a per-sample one-hot covariate stacked across all (sample, gene) rows, the residuals match \(\log \tilde P\) row-for-row, so running this function sample-by-sample is identical to fitting a one-hot sample-ID covariate in log-space and residualising.

Companion functions:

normalize_covariates() removes per-bin bias linear in user-supplied covariate spectra (cell-type proportion maps, tissue domains, housekeeping templates).
normalize_shape() removes per-(sample, gene) amplitude by L1-normalising along the frequency axis.

Examples

>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> spec = rng.lognormal(size=(2, 5, 8))      # (n_samples, G, K)
>>> P_tilde = normalize_background(spec)
>>> P_tilde.shape
(2, 5, 8)
>>> # Cross-gene geometric mean at each (sample, bin) is unity:
>>> bool(np.allclose(np.prod(P_tilde, axis=-2), 1.0))
True

quadsv.comparators.multisample.normalize_covariates(spectra, covariate_spectra, *, fit_intercept=True, eps=1e-12)[source]#

Residualise log-spectra against the log of covariate spectra.

Each gene’s log-spectrum is regressed (per gene, OLS in log-space) on the log of the supplied covariate spectra plus an optional intercept; the function exponentiates and returns the residual spectrum. Use to remove the multiplicative contribution of structured per-bin templates (cell-type proportion maps, tissue-domain indicators, housekeeping composite expression) from every gene’s per-frequency power.

Operating in log-space matches the multiplicative noise model of spectral data, keeps the output strictly positive (so the result composes cleanly with the downstream log_l2 test), and makes this helper commute exactly with normalize_background() (orthogonal projections along orthogonal axes — see Notes).

Parameters:

spectra (np.ndarray) – Non-negative gene spectra \(P\) of shape (G, K) to residualise.
covariate_spectra (np.ndarray) – Non-negative covariate spectra \(C\) of shape (n_cov, K).
fit_intercept (bool, default True) – If True, prepend a column of ones to the design matrix \(X\) so per-gene log-amplitude offsets along the frequency axis are absorbed.
eps (float, default 1e-12) – Floor \(\varepsilon\) added inside \(\log(\cdot)\) on both spectra and covariate_spectra to keep zeros finite.

Returns:

Residual spectra \(\tilde P\) of shape (G, K), strictly positive. Never mutates the input.

Return type:

np.ndarray

Raises:

ValueError – If covariate_spectra has a different last-axis length than spectra.

Notes

Let \(P \in \mathbb{R}_{\geq 0}^{G \times K}\) denote the input spectra (\(G\) genes, \(K\) frequency bins) and \(C \in \mathbb{R}_{\geq 0}^{n_{\mathrm{cov}} \times K}\) the covariate spectra. Build the log-design matrix

\[X = \bigl[\, \mathbf{1}_{K} \;\big|\; \log(C^{\top} + \varepsilon) \,\bigr] \;\in\; \mathbb{R}^{K \times (n_{\mathrm{cov}} + 1)},\]

dropping the leading column \(\mathbf{1}_{K}\) when fit_intercept=False. Fit per-gene OLS coefficients via the Moore-Penrose pseudoinverse \(X^{+}\) against the log of the response,

\[\hat\beta_{g} = X^{+}\, \bigl[ \log( P_{g,\cdot} + \varepsilon ) \bigr]^{\top} \;\in\; \mathbb{R}^{n_{\mathrm{cov}} + 1},\]

and return the exponentiated residual

\[\tilde P_{g,k} = \exp\!\Bigl( \log( P_{g,k} + \varepsilon ) - X_{k,\cdot}\,\hat\beta_{g} \Bigr).\]

Equivalently,

\[\log \tilde P_{g,\cdot}^{\top} = \bigl( I_{K} - X X^{+} \bigr)\, \bigl[ \log( P_{g,\cdot} + \varepsilon ) \bigr]^{\top},\]

i.e., the orthogonal projection of each gene’s log-spectrum onto the orthogonal complement of the column space of \(X\), then exponentiated.

Commutativity with normalize_background(). In log-space the two operations are left- vs right-multiplication of the \(G \times K\) log-spectrum matrix by orthogonal-projection matrices on disjoint axes,

\[\mathrm{bg}: \;\log P \;\mapsto\; \bigl( I_{G} - \tfrac{1}{G}\mathbf{1}_{G}\mathbf{1}_{G}^{\top} \bigr)\,\log P, \qquad \mathrm{cov}: \;\log P \;\mapsto\; \log P \,\bigl( I_{K} - X X^{+} \bigr).\]

Left- and right-multiplication trivially commute, so we have the exact identity normalize_background(normalize_covariates(P)) == normalize_covariates(normalize_background(P)).

With fit_intercept=True and no covariates (empty covariate_spectra), this reduces to per-gene log-mean centring along the frequency axis,

\[\tilde P_{g,k} = \frac{P_{g,k} + \varepsilon} {\exp\!\bigl(\tfrac{1}{K} \sum_{k'=1}^{K}\log(P_{g,k'} + \varepsilon) \bigr)},\]

i.e., dividing each gene’s spectrum by its own cross-bin geometric mean — a per-gene companion to normalize_background()’s per-bin cross-gene operation, distinct from normalize_shape()’s arithmetic-mean / sum-1 normalisation.

Companion functions:

normalize_background() removes per-sample multiplicative gain via cross-gene geometric-mean centring in log-space (perpendicular axis to this function).
normalize_shape() removes per-(sample, gene) amplitude by L1-normalising along the frequency axis.

Examples

>>> import numpy as np
>>> rng  = np.random.default_rng(0)
>>> spec = rng.lognormal(size=(20, 8))      # (G, K)
>>> cov  = rng.lognormal(size=(2, 8))       # (n_cov, K)
>>> resid = normalize_covariates(spec, cov)
>>> resid.shape
(20, 8)
>>> bool((resid > 0).all())     # log-space output is strictly positive
True

quadsv.comparators.multisample.normalize_shape(spectra, *, axis=-1, eps=1e-12)[source]#

Project each spectrum onto the probability simplex along axis.

Each fibre along axis is divided by its L1 norm, so the result is a proper probability distribution over the entries along that axis. Two fibres that differ only by a positive scalar produce identical outputs — only the shape of the power-vs-frequency curve survives, the overall scale is removed.

Parameters:

spectra (np.ndarray) – Non-negative spectra \(P\). Any leading dimensions are preserved; normalisation acts along axis only.
axis (int, default -1) – Axis to L1-normalise along (typically the trailing frequency-bin axis).
eps (float, default 1e-12) – Floor \(\varepsilon\) on the per-fibre sum to avoid division by zero.

Returns:

Shape-normalized spectra \(\tilde P\), same shape as spectra, summing to 1 along axis. Never mutates the input.

Return type:

np.ndarray

Notes

Let \(P\) denote the input spectrum with \(K\) entries along axis and \(\varepsilon\) the eps floor. The output is the per-fibre L1 quotient

\[\tilde P_{\ldots,k} = \frac{P_{\ldots,k}} {\sum_{k'=1}^{K} P_{\ldots,k'} + \varepsilon},\]

so \(\sum_{k} \tilde P_{\ldots,k} = 1\) for every leading-index combination (modulo the \(\varepsilon\) floor; fibres whose total sum is below \(\varepsilon\) are effectively returned unchanged because the numerator dominates the floor).

Equivalently, in log-space this is per-fibre log-sum centring,

\[\log \tilde P_{\ldots,k} = \log P_{\ldots,k} - \log\!\Bigl( \textstyle\sum_{k'=1}^{K} P_{\ldots,k'} + \varepsilon \Bigr).\]

After this transform every fibre is a probability vector over frequency bins, so distances such as Jensen-Shannon and total-variation are well-defined between fibres.

Used internally by the spectrum comparison functions (compare_two_groups(), compare_two_groups_masked(), compare_glm()) when their normalize_shape=True keyword argument is set — the differential-frequency test then fires only on shape redistribution across radial bins, not on overall amplitude changes.

Companion functions:

normalize_background() removes per-sample multiplicative gain via cross-gene geometric-mean centring in log-space.
normalize_covariates() removes per-bin bias linear in user-supplied covariate spectra.

Examples

>>> import numpy as np
>>> x = np.array([[1.0, 2.0, 4.0], [10.0, 20.0, 40.0]])
>>> P_tilde = normalize_shape(x, axis=-1)
>>> bool(np.allclose(P_tilde.sum(axis=-1), 1.0))
True
>>> bool(np.allclose(P_tilde[0], P_tilde[1]))  # only the shape survives
True

quadsv.comparators.multisample.radial_bin_spectrum(spectrum, grid_shape, n_bins=30, fft_solver='rfft2', exclude_dc=True, spacing=None, edges=None)[source]#

Bin a 2D power spectrum into n_bins radial frequency bins.

By default the binning axis is the normalized radial frequency \(k = \sqrt{(k_x/n_x)^2 + (k_y/n_y)^2} \in [0,\,\sqrt{0.5}]\), so spectra from samples with different (ny, nx) map onto the same K bins. Passing spacing=(dy, dx) (in physical units, e.g. μm per cell) switches the binning axis to cycles per unit length (cycles/μm → multiply by 1000 for cycles/mm), so bins are directly comparable across samples with different physical resolutions. In that case, also pass edges to enforce a common bin grid across samples.

Parameters:

spectrum (np.ndarray) – Power spectrum of shape (..., ny, n_kx). Leading dims (e.g., genes, samples) are preserved.
grid_shape (tuple[int, int]) – Original (ny, nx) of the rasterized image (needed because rfft2 only stores half of the kx axis).
n_bins (int, default 30) – Number of radial bins. Ignored when edges is supplied.
fft_solver ({'fft2', 'rfft2'}, default 'rfft2') – FFT solver used to produce spectrum. Must match.
exclude_dc (bool, default True) – If True, drop the zero-frequency (DC) bin from the output.
spacing (tuple[float, float], optional) – Physical spacing (dy, dx) per grid cell (e.g., μm). If given, the binning axis is physical frequency in cycles per unit length.
edges (np.ndarray, optional) – Explicit monotonically increasing bin edges (length n_bins + 1) in the same frequency units as spacing (or normalized if spacing is None). When supplied, this overrides n_bins and gives every sample identical bin boundaries — required for cross-sample comparisons in physical units.

Returns:

Radial spectra of shape (..., n_bins) (or n_bins - 1 when exclude_dc=True).

Return type:

np.ndarray

Raises:

ValueError – If spectrum’s last two dims do not match the expected shape implied by grid_shape and fft_solver.