quadsv.comparators.multisample#
Cross-sample spatial pattern comparison in the frequency domain.
This module implements an alignment-free, frequency-domain approach for ranking genes by spatial-pattern difference between two groups of spatial-omics samples (e.g., N healthy vs M cancer slides). The key primitive is the 2D power spectrum \(|\hat{x}(k)|^2\) of a rasterized gene image: power spectra are translation-invariant, so samples need not be spatially registered.
Pipeline#
Per-sample spectra —
compute_sample_spectrum()runsquadsv.kernels.fft.power_spectrum_2d()on each sample’s(n_genes, ny, nx)array.Radial binning (default, rotation-invariant) —
radial_bin_spectrum()collapses the 2D spectrum onto aK-dim vector indexed by normalized radial frequency, harmonizing samples with different(ny, nx).(Optional) 2D mode with rotation alignment —
align_spectra_by_rotation()rotates each sample’s full 2D spectrum to maximize similarity to a reference, restoring comparability when directional anisotropy matters.Batch correction —
normalize_background()cancels per-slide gain/sensitivity differences;normalize_covariates()regresses out user-supplied covariate spectra (cell-type proportions, tissue domains, etc.);normalize_shape()projects per-(sample, gene) spectra onto the probability simplex along the frequency axis, isolating shape-only redistribution from amplitude differences.Cross-sample comparison per gene — three dispatch targets share the same per-gene output schema:
compare_two_groups()(binary 1-D labels) — permutation or analytic Wald (Liu mixture-χ²) null;compare_two_groups_masked()(binary + per-(sample, gene) presence mask) — same nulls, masked per-gene cohort;compare_glm()(general OLS design + contrast) — Wald null only.
compare_two_groups_scalar()runs the DC-component DE companion on per-(sample, gene) scalars (Welch t analytic or permutation).
This module only contains array-level primitives. The two high-level
wrapper classes that drive the pipeline on anndata.AnnData /
spatialdata.SpatialData containers live in
quadsv.comparators (ComparatorIrregular /
ComparatorGrid); their test_diff_freq(design, ...)
method dispatches between the three comparison primitives above.
Notes
The default log-L2 statistic is a quadratic form on the log-radial
spectrum: take per-group means in log-space, weight the difference
by the bin weights W, and report T² = D' W D. At typical
study sizes (3–10 slides per group) the exact-permutation test hits
a BH-FDR floor; the analytic Wald null (Liu’s χ² mixture
approximation against a pooled within-group Σ) bypasses that floor
while remaining well-calibrated on real data — see
scripts/comparator_benchmark for the calibration battery.
Functions#
|
Two-step rotation alignment: estimate per-sample rotations from |
|
Rotate each sample's 2D power spectra by a per-sample angle. |
|
Generalized two-group / continuous-covariate test via a design matrix. |
|
Test, for every gene, whether its spatial-pattern spectrum differs between two groups. |
|
Per-gene two-group pattern test with incomplete data across samples. |
|
Per-gene two-sample test on scalar per-sample values (classical DE). |
|
Compute the 2D power spectrum of every gene in a single sample. |
|
Estimate the per-sample rotation that best aligns every landmark |
|
Cancel per-sample multiplicative gain via cross-gene geometric-mean centring. |
|
Residualise log-spectra against the log of covariate spectra. |
|
Project each spectrum onto the probability simplex along |
|
Bin a 2D power spectrum into |
Module Contents#
- quadsv.comparators.multisample.align_spectra_by_rotation(landmark_spectra, grid_shapes, *, target_spectra=None, fft_solver='fft2', reference_index=0, n_theta=180, n_radius=64, progress=False)[source]#
Two-step rotation alignment: estimate per-sample rotations from landmark spectra (whose first dimension must match across samples), then apply those rotations to a separate set of target spectra (the full gene panel for each sample, typically a superset of the landmarks).
This is a convenience wrapper around
estimate_rotations_from_landmarks()andapply_rotations_to_spectra(). Calling those directly is the right pattern when you want to inspect / cache the per-sample angles before applying them.Implementation#
For every non-reference sample:
Expand each landmark’s 2D power spectrum to full-fft2 layout, fftshift so DC sits at the centre, and resample onto a polar
(n_theta, n_radius)grid.Compute per-landmark circular cross-correlation along the polar-angle axis against the reference sample’s same-index landmark. Every landmark contributes its own cross-correlation and the per-sample rotation is the angle that maximises the sum across landmarks (and across radii). Mean-template alignment — what the previous implementation did — was strictly weaker because the off-diagonal
i ≠ jpair terms incorr(mean(a), mean(b))are pure noise.Rotate every entry of
target_spectra[i](if supplied) by the recovered angle.
- param landmark_spectra:
Per-sample landmark spectra, shape
(n_landmarks, ny, n_kx)per sample.n_landmarksmust match across samples.- type landmark_spectra:
sequence of np.ndarray
- param grid_shapes:
Per-sample
(ny, nx)of the original rasterized image.- type grid_shapes:
sequence of tuple[int, int]
- param target_spectra:
Per-sample spectra to which the recovered rotations are applied. Any first-axis dimension (e.g. full gene panel). If
None, only the angles are returned.- type target_spectra:
sequence of np.ndarray, optional
- param fft_solver:
FFT layout of both inputs.
fft2is recommended so the full angular content is present.- type fft_solver:
{‘fft2’, ‘rfft2’}, default ‘fft2’
- param reference_index:
- type reference_index:
int, default 0
- param n_theta:
- type n_theta:
int, default 180
- param n_radius:
- type n_radius:
int, default 64
- param progress:
- type progress:
bool, default False
- returns:
rotated (list of np.ndarray or None) – Per-sample rotated target spectra (or
Nonewhentarget_spectrais omitted).angles_deg (np.ndarray) –
(n_samples,)recovered rotation angles in degrees. Reference angle is 0.
- raises ValueError:
If
reference_indexis out of range, iflandmark_spectrasamples disagree onn_landmarks, or iftarget_spectralength does not match.
- Parameters:
- Return type:
- quadsv.comparators.multisample.apply_rotations_to_spectra(spectra, grid_shapes, angles_deg, *, fft_solver='fft2', progress=False)[source]#
Rotate each sample’s 2D power spectra by a per-sample angle.
- Parameters:
spectra (sequence of np.ndarray) – Per-sample 2D power spectra — any first-axis dimension (e.g. full
n_genes). Shape(n, ny, n_kx)with(ny, n_kx)matchingfft_solver.grid_shapes (sequence of tuple[int, int]) – Per-sample
(ny, nx)of the original rasterized image.angles_deg (np.ndarray) – Per-sample rotation angles in degrees (e.g. produced by
estimate_rotations_from_landmarks()). Length must equallen(spectra).fft_solver ({'fft2', 'rfft2'}, default 'fft2') – FFT layout of
spectra.progress (bool, default False) – Show a tqdm bar across samples.
- Returns:
rotated – Per-sample rotated spectra with the same shape as the input.
- Return type:
list of np.ndarray
Notes
Rotation is done on the 2D power spectrum (fftshifted so DC sits at the centre), not back on the spatial image. That is enough for any downstream analysis that operates on aligned spectra (radial or 2D-bin tests). Samples whose angle is exactly 0 are passed through as-is.
- quadsv.comparators.multisample.compare_glm(spectra, design, contrast, gene_names=None, statistic='log_l2', null='wald', freq_weights=None, normalize_shape=False)[source]#
Generalized two-group / continuous-covariate test via a design matrix.
Generalises
compare_two_groups()from binary group labels to an arbitrary GLM design matrix and a single-DOF linear contrast. The binary case is recovered exactly by passingdesign=pd.DataFrame({"group": groups})andcontrast="group"; p-values matchcompare_two_groups()to machine precision.- Parameters:
spectra (np.ndarray) –
(n_samples, n_genes, K)spectral features (raw, not logged).design (pd.DataFrame or np.ndarray) – Sample-level metadata.
DataFramecolumns are auto-encoded viapatsy(treatment-coded categoricals + intercept);ndarrayis passed through as the design matrix verbatim (caller responsible for the intercept column).contrast (str, dict, or np.ndarray) –
Linear-contrast specification:
str— name of a design column. Auto-resolves treatment-coded categoricals (e.g.,"genotype"matches"genotype[T.TG]"). Multi-DOF (3+ level factor) contrasts must be passed as an explicit dict or ndarray.dict[str, float]— coefficient per design column.np.ndarrayof lengthp— raw contrast vector.
gene_names (sequence of str, optional)
statistic ({'log_l2'}, default 'log_l2') – Currently only
log_l2is supported in the GLM path.null ({'wald'}, default 'wald') – Only the analytic Wald-type null is supported here. Permutation nulls for continuous covariates are intentionally deferred (naive row permutation breaks the X-y joint distribution under nuisance covariates; correct alternatives like Freedman–Lane add complexity without a clear payoff over the analytic Wald). Pass
null="permutation"only viacompare_two_groups()(binary labels) for permutation-based tests.normalize_shape (bool, default False) – If True, divide each per-(sample, gene) spectrum by its sum along the trailing (frequency) axis before the GLM is fit (same semantics as in
compare_two_groups()). Use to isolate shape-only / frequency-redistribution signals along the design contrast — the test then only fires when the relative distribution of power across radial frequencies varies with the contrast, independent of overall amplitude.freq_weights (ndarray | None)
- Returns:
Columns
Feature,Statistic,P_value,P_adj— same schema ascompare_two_groups().- Return type:
pd.DataFrame
- Raises:
ValueError – If shapes are inconsistent or
contrastcannot be resolved.NotImplementedError – If
null='permutation'is requested.
- quadsv.comparators.multisample.compare_two_groups(spectra, groups, gene_names=None, statistic='log_l2', null='wald', n_perm=1000, random_state=None, n_jobs=1, freq_weights=None, n_perm_max=10000, normalize_shape=False)[source]#
Test, for every gene, whether its spatial-pattern spectrum differs between two groups.
- Parameters:
spectra (np.ndarray) – Per-sample spectral features of shape
(n_samples, n_genes, K).groups (np.ndarray) – Group labels of length
n_samplestaking exactly two distinct values (mapped internally to 0/1 in sorted order).gene_names (sequence of str, optional) – Names for the gene axis. If None, integer indices are used.
statistic ({'log_l2', 'welch_t_cauchy'}, default 'log_l2') –
Test statistic:
'log_l2'— (optionally weighted) L2 distance between mean log-spectra. Global / summary statistic. Pair withnull='wald'for an analytic mixture-χ² null that bypasses the small-n permutation BH-floor;null='permutation'(default) falls back to label permutations with exact enumeration whenC(n, n_a) ≤ n_perm_max.'welch_t_cauchy'— per-bin Welch two-sided t-test with analytic (t-distribution) p-values combined across bins via Cauchy combination. Analytic is the whole point: permutation p-values would floor at1/(n_perm + 1)per bin, which would also floor the gene-level combined p-value and destroy BH-FDR power across thousands of genes. Yields an extraP_value_per_bincolumn.
null ({'wald', 'permutation'}, default 'wald') –
Null-distribution method.
'wald'(the default) uses an analytic Wald-type test for the L2 quadratic form: under H₀ the statisticT² = D'WDis distributed as a weighted sum of χ²₁ variables whose tail is integrated via Liu’s approximation (seequadsv.statistics.liu_sf()).'permutation'uses the empirical sample-label permutation null and is the only option that respects then_perm/random_state/n_perm_maxarguments. Currentlynull='wald'is only supported forstatistic='log_l2'; raisesValueErrorotherwise.welch_t_cauchyignores this argument.Sample-size guidance (residual df =
n_a + n_b - 2):df ≥ 4 (n_a + n_b ≥ 6):
'wald'recommended — strong calibration + sensitivity; sweeps the top of our benchmark.df ≥ 3 (n_a + n_b ≥ 5):
'wald'acceptable.df < 3 (n_a + n_b ≤ 4):
'wald'emits aUserWarning; σ̂² has ≥ 67% relative noise so per-test calibration may be anti-conservative. Preferstatistic='welch_t_cauchy'(per-bin Welch t with proper df-corrected denominator) or stay withnull='permutation'if the cohort allows enough exact relabellings.
n_perm (int, default 1000) – Number of label permutations for the null distribution. Ignored when
statistic='welch_t_cauchy'ornull='wald'.random_state (int, optional) – Seed for the permutation RNG (ignored for
'welch_t_cauchy').n_jobs (int, default 1) – Reserved for future parallelism over genes; currently unused (the per-stat implementations are already vectorized over genes).
freq_weights (np.ndarray, optional) – Only used by
statistic='log_l2'. Non-negative weights of lengthK(the number of frequency bins); internally renormalized to sum-1. Lets the user emphasize specific frequencies — e.g., a polynomial low-pass shape to mirror a CAR kernel, or an exponential high-pass shape to mirror a Gaussian kernel.None(default) means uniform weights.n_perm_max (int, default 10000) – If the total number of distinct two-group relabellings
C(n_samples, n_a)is at most this, every possible relabelling is enumerated (exact permutation test) andn_permis overridden to the enumeration count. This is both faster and strictly more accurate than sampling in the small-sample regime (e.g. 6-vs-6 → 924 partitions, 5-vs-5 → 252). Above the threshold the test falls back ton_permrandom shuffles.normalize_shape (bool, default False) – If True, divide each per-(sample, gene) spectrum by its sum along the trailing (frequency) axis before the statistic is computed (i.e., apply
normalize_shape()tospectrafirst). Use to isolate shape-only / frequency-redistribution signals — the test then only fires when the relative distribution of power across radial frequencies varies with the contrast, independent of overall amplitude. Works with every validstatistic=value.
- Returns:
Columns
Feature,Statistic,P_value,P_adj(BH-FDR), sorted by descending statistic. Whenstatistic='welch_t_cauchy', the frame also carries aP_value_per_binobject column — each entry is an(K,)numpy array of per-bin permutation p-values for that gene.- Return type:
pd.DataFrame
- Raises:
ValueError – If
statisticis unknown,groupsdoes not contain exactly two values, or shapes are inconsistent.
- quadsv.comparators.multisample.compare_two_groups_masked(spectra, groups, presence, gene_names=None, statistic='log_l2', null='wald', n_perm=1000, random_state=None, min_samples_per_group=2, freq_weights=None, n_perm_max=10000, normalize_shape=False)[source]#
Per-gene two-group pattern test with incomplete data across samples.
For each gene, only the subset of samples with
presence[:, g] == Truecontributes to the observed statistic and to the label-permutation null. Genes that fail to reachmin_samples_per_groupobservations in at least one group are reported withNaNp-values and the number of observed samples per group, so the user sees why they were skipped.- Parameters:
spectra (np.ndarray) –
(n_samples, n_genes, K).groups (np.ndarray) –
(n_samples,), exactly two distinct labels.presence (np.ndarray) –
(n_samples, n_genes)boolean mask.True= gene is observed in that sample (contributes);False= gene is absent (ignored).gene_names (sequence of str, optional)
statistic ({'log_l2', 'welch_t_cauchy'}, default 'log_l2')
null ({'wald', 'permutation'}, default 'wald') –
Null-distribution method.
'wald'(the default) uses an analytic Wald-type test adapted for the masked case via a mask-aware pooled-Σ estimator: a single global(K, K)Σ is accumulated across every gene’s present (sample, gene) cells (each gene contributesn_g - 2residual degrees of freedom), and per-gene noncentrality scalingv_{c,g} = 1/n_a_g + 1/n_b_gadjusts the eigenvalues for that gene’s specific cohort. Cross-bin correlation structure is taken to be homogeneous across genes (the same A3 assumption used incompare_two_groups()with Wald). Empirical calibration on synthetic missingness up to 50% matches the unmasked Wald path. Currently supported only withstatistic='log_l2'.'permutation'runs a per-gene permutation test, exact-enumerated whenC(n_g, n_a_g) ≤ n_perm_max(most genes at small samples).n_perm (int, default 1000)
random_state (int, optional)
min_samples_per_group (int, default 2) – Minimum observed samples in each group for the gene to be tested.
freq_weights (np.ndarray, optional) – Only consumed by
statistic='log_l2'(same semantics ascompare_two_groups()).normalize_shape (bool, default False) – If True, divide each per-(sample, gene) spectrum by its sum along the trailing (frequency) axis before the statistic is computed (same semantics as in
compare_two_groups()). Use to isolate shape-only / frequency-redistribution signals. Works with every validstatistic=value.n_perm_max (int)
- Returns:
Columns
Feature,Statistic,P_value,P_adj,n_obs_A,n_obs_B. For'welch_t_cauchy'aP_value_per_bincolumn is also included (Nonefor skipped genes). BH-FDR is computed only over tested genes.- Return type:
pd.DataFrame
- quadsv.comparators.multisample.compare_two_groups_scalar(values, groups, gene_names=None, null='wald', n_perm=1000, random_state=None, n_perm_max=10000)[source]#
Per-gene two-sample test on scalar per-sample values (classical DE).
The natural companion to
compare_two_groups(): tested on the DC scalars (per-gene grid means) produced bycompute_sample_spectrum(), the result is statistically independent of the spectral pattern test because DC and AC are orthogonal by construction (the FFT pipeline always mean- centres each gene’s grid before computing power).Two null distributions are supported, chosen via
null:null='wald'(default) — analytic two-sided Welch t-test p-value from the Welch-Satterthwaite t-distribution. No permutation BH-floor; the natural counterpart tocompare_two_groups()’snull='wald'analytic path on the spectral side. The Welch t is itself a Wald-type statistic (point estimate / estimated SE under H₁), hence the shared kwarg name across the API surface.null='permutation'— exact / approximate permutation null onabs(Welch t). More conservative at smalln; produces identical p-values as a Mann-Whitney-style rank test up to ties when the permutation pool is exhausted.
- Parameters:
values (np.ndarray) – Per-sample per-gene scalars of shape
(n_samples, n_genes)— e.g., log-normalized mean expression on each slide.groups (np.ndarray) – Group labels of length
n_sampleswith exactly two distinct values.gene_names (sequence of str, optional) – Gene names. Integer indices if None.
null ({'wald', 'permutation'}, default 'wald') – Null-distribution method.
'wald'returns analytic Welch-Satterthwaite t-distribution p-values;'permutation'runs a label-shuffle null onabs(Welch t).n_perm (int, default 1000) – Number of sample-label permutations for
null='permutation'. Ignored whennull='wald'.random_state (int, optional) – Seed for the permutation RNG. Ignored when
null='wald'.n_perm_max (int, default 10000) – Cap on enumerated unique permutations.
- Returns:
Columns
Feature,Statistic(abs(Welch t)),Mean_diff(mean_groupA − mean_groupB),P_value,P_adj(BH-FDR), sorted by descendingStatistic.- Return type:
pd.DataFrame
- Raises:
ValueError – If shapes are inconsistent,
groupsdoes not contain exactly two distinct values, ornullis unknown.
- quadsv.comparators.multisample.compute_sample_spectrum(sample, fft_solver='rfft2', workers=None, return_dc=False)[source]#
Compute the 2D power spectrum of every gene in a single sample.
The spatial signal is mean-centred per gene before the FFT so that the resulting power spectrum carries only the AC component of the pattern — i.e. the
k=0(DC) bin is exactly zero and low-kleakage from per- sample mean shifts is eliminated. The separated DC scalars (the per-sample per-gene grid means) can be returned alongside the spectrum withreturn_dc=Trueand are the natural target for a classical differential expression test complementary to the spectral pattern test.- Parameters:
sample (np.ndarray) – Rasterized expression of shape
(n_genes, ny, nx).fft_solver ({'fft2', 'rfft2'}, default 'rfft2') – FFT routine forwarded to
quadsv.kernels.fft.power_spectrum_2d().workers (int, optional) – Parallel workers forwarded to
scipy.fft.return_dc (bool, default False) – If True, also return a
(n_genes,)array of per-gene grid means (DC scalars of the uncentered signal).
- Returns:
Power spectra of shape
(n_genes, ny, n_kx). Ifreturn_dc=True, also returns a(n_genes,)DC array.- Return type:
np.ndarray or tuple[np.ndarray, np.ndarray]
- Raises:
ValueError – If
sampleis not 3D.
- quadsv.comparators.multisample.estimate_rotations_from_landmarks(landmark_spectra, grid_shapes, *, fft_solver='fft2', reference_index=0, n_theta=180, n_radius=64, progress=False)[source]#
Estimate the per-sample rotation that best aligns every landmark spectrum to the reference sample’s corresponding landmark.
For each non-reference sample the routine picks a single rotation angle that maximises the sum over landmarks of the per-landmark circular cross-correlation along the polar-angle axis — i.e. each landmark aligns to its same-index counterpart in the reference (not to a mean template). This is strictly stronger than mean-template alignment because it ignores cross-landmark noise (the off-diagonal
i ≠ jterms that mean-of-means picks up) and picks up anisotropy shared across every landmark at a common orientation.- Parameters:
landmark_spectra (sequence of np.ndarray) – Per-sample landmark spectra. Shape
(n_landmarks, ny, n_kx)with(ny, n_kx)followingfft_solver. The first dimension (n_landmarks) must match across samples — landmarkjin sample A is compared to landmarkjin sample B.grid_shapes (sequence of tuple[int, int]) – Per-sample
(ny, nx)of the original rasterized image.fft_solver ({'fft2', 'rfft2'}, default 'fft2') – FFT layout of
landmark_spectra— rfft2 spectra are expanded to full 2D before resampling to preserve angular content.reference_index (int, default 0) – Which sample’s landmarks act as the rotation reference (its angle is fixed at 0).
n_theta (int, default 180) – Angular resolution of the polar resampling. Recovered angles are accurate to
180 / n_thetadegrees.n_radius (int, default 64) – Radial resolution of the polar resampling.
progress (bool, default False) – If True, show a tqdm bar over non-reference samples.
- Returns:
angles_deg –
(n_samples,)recovered rotation angles in degrees. Reference angle is exactly 0.- Return type:
np.ndarray
- Raises:
ValueError – If
reference_indexis out of range or any two samples have inconsistentn_landmarks.
- quadsv.comparators.multisample.normalize_background(spectra, *, axis=-2, eps=1e-12)[source]#
Cancel per-sample multiplicative gain via cross-gene geometric-mean centring.
For each (sample, frequency-bin) pair, every gene’s power is divided by the geometric mean of the spectrum across the genes axis. Use this to correct per-sample multiplicative gain (sequencing depth, antibody titre, dewaxing efficiency) that scales every gene’s spectrum at every frequency by a sample-level factor.
- Parameters:
spectra (np.ndarray) – Non-negative spectra \(P\) with shape
(..., G, K)(\(G\) alongaxis, \(K\) frequency bins on the trailing axis). Any leading dimensions (e.g.,n_samples) are broadcast over.axis (int, default -2) – Axis along which the cross-gene geometric mean is taken (the genes axis).
eps (float, default 1e-12) – Floor \(\varepsilon\) added before the logarithm to keep zeros finite.
- Returns:
Background-normalized spectra \(\tilde P\), same shape as
spectra. Never mutates the input.- Return type:
np.ndarray
Notes
Let \(P\) denote the input spectrum, \(G\) the number of genes (length of
axis), \(K\) the number of frequency bins, and \(\varepsilon\) theepsfloor. The per-bin geometric-mean background is\[b_{k} = \exp\!\Bigl( \tfrac{1}{G} \sum_{g'=1}^{G} \log\bigl(P_{g',k} + \varepsilon\bigr) \Bigr),\]and the output is the per-bin gene-wise quotient
\[\tilde P_{g,k} = \frac{P_{g,k}}{b_{k}}.\]Equivalently, in log-space this is per-bin mean centring across the genes axis,
\[\log \tilde P_{g,k} = \log\bigl(P_{g,k} + \varepsilon\bigr) - \tfrac{1}{G} \sum_{g'=1}^{G} \log\bigl(P_{g',k} + \varepsilon\bigr),\]so after the transform \(\prod_{g} \tilde P_{g,k} = 1\) at every bin \(k\) — the cross-gene geometric mean at every frequency is unity.
The operation is equivalent to a per-bin OLS regression of \(\log P_{\cdot,k}\) against a constant (the cross-gene mean) followed by exponentiation. With a per-sample one-hot covariate stacked across all (sample, gene) rows, the residuals match \(\log \tilde P\) row-for-row, so running this function sample-by-sample is identical to fitting a one-hot sample-ID covariate in log-space and residualising.
Companion functions:
normalize_covariates()removes per-bin bias linear in user-supplied covariate spectra (cell-type proportion maps, tissue domains, housekeeping templates).normalize_shape()removes per-(sample, gene) amplitude by L1-normalising along the frequency axis.
Examples
>>> import numpy as np >>> rng = np.random.default_rng(0) >>> spec = rng.lognormal(size=(2, 5, 8)) # (n_samples, G, K) >>> P_tilde = normalize_background(spec) >>> P_tilde.shape (2, 5, 8) >>> # Cross-gene geometric mean at each (sample, bin) is unity: >>> bool(np.allclose(np.prod(P_tilde, axis=-2), 1.0)) True
- quadsv.comparators.multisample.normalize_covariates(spectra, covariate_spectra, *, fit_intercept=True, eps=1e-12)[source]#
Residualise log-spectra against the log of covariate spectra.
Each gene’s log-spectrum is regressed (per gene, OLS in log-space) on the log of the supplied covariate spectra plus an optional intercept; the function exponentiates and returns the residual spectrum. Use to remove the multiplicative contribution of structured per-bin templates (cell-type proportion maps, tissue-domain indicators, housekeeping composite expression) from every gene’s per-frequency power.
Operating in log-space matches the multiplicative noise model of spectral data, keeps the output strictly positive (so the result composes cleanly with the downstream
log_l2test), and makes this helper commute exactly withnormalize_background()(orthogonal projections along orthogonal axes — see Notes).- Parameters:
spectra (np.ndarray) – Non-negative gene spectra \(P\) of shape
(G, K)to residualise.covariate_spectra (np.ndarray) – Non-negative covariate spectra \(C\) of shape
(n_cov, K).fit_intercept (bool, default True) – If True, prepend a column of ones to the design matrix \(X\) so per-gene log-amplitude offsets along the frequency axis are absorbed.
eps (float, default 1e-12) – Floor \(\varepsilon\) added inside \(\log(\cdot)\) on both
spectraandcovariate_spectrato keep zeros finite.
- Returns:
Residual spectra \(\tilde P\) of shape
(G, K), strictly positive. Never mutates the input.- Return type:
np.ndarray
- Raises:
ValueError – If
covariate_spectrahas a different last-axis length thanspectra.
Notes
Let \(P \in \mathbb{R}_{\geq 0}^{G \times K}\) denote the input spectra (\(G\) genes, \(K\) frequency bins) and \(C \in \mathbb{R}_{\geq 0}^{n_{\mathrm{cov}} \times K}\) the covariate spectra. Build the log-design matrix
\[X = \bigl[\, \mathbf{1}_{K} \;\big|\; \log(C^{\top} + \varepsilon) \,\bigr] \;\in\; \mathbb{R}^{K \times (n_{\mathrm{cov}} + 1)},\]dropping the leading column \(\mathbf{1}_{K}\) when
fit_intercept=False. Fit per-gene OLS coefficients via the Moore-Penrose pseudoinverse \(X^{+}\) against the log of the response,\[\hat\beta_{g} = X^{+}\, \bigl[ \log( P_{g,\cdot} + \varepsilon ) \bigr]^{\top} \;\in\; \mathbb{R}^{n_{\mathrm{cov}} + 1},\]and return the exponentiated residual
\[\tilde P_{g,k} = \exp\!\Bigl( \log( P_{g,k} + \varepsilon ) - X_{k,\cdot}\,\hat\beta_{g} \Bigr).\]Equivalently,
\[\log \tilde P_{g,\cdot}^{\top} = \bigl( I_{K} - X X^{+} \bigr)\, \bigl[ \log( P_{g,\cdot} + \varepsilon ) \bigr]^{\top},\]i.e., the orthogonal projection of each gene’s log-spectrum onto the orthogonal complement of the column space of \(X\), then exponentiated.
Commutativity with
normalize_background(). In log-space the two operations are left- vs right-multiplication of the \(G \times K\) log-spectrum matrix by orthogonal-projection matrices on disjoint axes,\[\mathrm{bg}: \;\log P \;\mapsto\; \bigl( I_{G} - \tfrac{1}{G}\mathbf{1}_{G}\mathbf{1}_{G}^{\top} \bigr)\,\log P, \qquad \mathrm{cov}: \;\log P \;\mapsto\; \log P \,\bigl( I_{K} - X X^{+} \bigr).\]Left- and right-multiplication trivially commute, so we have the exact identity
normalize_background(normalize_covariates(P)) == normalize_covariates(normalize_background(P)).With
fit_intercept=Trueand no covariates (emptycovariate_spectra), this reduces to per-gene log-mean centring along the frequency axis,\[\tilde P_{g,k} = \frac{P_{g,k} + \varepsilon} {\exp\!\bigl(\tfrac{1}{K} \sum_{k'=1}^{K}\log(P_{g,k'} + \varepsilon) \bigr)},\]i.e., dividing each gene’s spectrum by its own cross-bin geometric mean — a per-gene companion to
normalize_background()’s per-bin cross-gene operation, distinct fromnormalize_shape()’s arithmetic-mean / sum-1 normalisation.Companion functions:
normalize_background()removes per-sample multiplicative gain via cross-gene geometric-mean centring in log-space (perpendicular axis to this function).normalize_shape()removes per-(sample, gene) amplitude by L1-normalising along the frequency axis.
Examples
>>> import numpy as np >>> rng = np.random.default_rng(0) >>> spec = rng.lognormal(size=(20, 8)) # (G, K) >>> cov = rng.lognormal(size=(2, 8)) # (n_cov, K) >>> resid = normalize_covariates(spec, cov) >>> resid.shape (20, 8) >>> bool((resid > 0).all()) # log-space output is strictly positive True
- quadsv.comparators.multisample.normalize_shape(spectra, *, axis=-1, eps=1e-12)[source]#
Project each spectrum onto the probability simplex along
axis.Each fibre along
axisis divided by its L1 norm, so the result is a proper probability distribution over the entries along that axis. Two fibres that differ only by a positive scalar produce identical outputs — only the shape of the power-vs-frequency curve survives, the overall scale is removed.- Parameters:
spectra (np.ndarray) – Non-negative spectra \(P\). Any leading dimensions are preserved; normalisation acts along
axisonly.axis (int, default -1) – Axis to L1-normalise along (typically the trailing frequency-bin axis).
eps (float, default 1e-12) – Floor \(\varepsilon\) on the per-fibre sum to avoid division by zero.
- Returns:
Shape-normalized spectra \(\tilde P\), same shape as
spectra, summing to 1 alongaxis. Never mutates the input.- Return type:
np.ndarray
Notes
Let \(P\) denote the input spectrum with \(K\) entries along
axisand \(\varepsilon\) theepsfloor. The output is the per-fibre L1 quotient\[\tilde P_{\ldots,k} = \frac{P_{\ldots,k}} {\sum_{k'=1}^{K} P_{\ldots,k'} + \varepsilon},\]so \(\sum_{k} \tilde P_{\ldots,k} = 1\) for every leading-index combination (modulo the \(\varepsilon\) floor; fibres whose total sum is below \(\varepsilon\) are effectively returned unchanged because the numerator dominates the floor).
Equivalently, in log-space this is per-fibre log-sum centring,
\[\log \tilde P_{\ldots,k} = \log P_{\ldots,k} - \log\!\Bigl( \textstyle\sum_{k'=1}^{K} P_{\ldots,k'} + \varepsilon \Bigr).\]After this transform every fibre is a probability vector over frequency bins, so distances such as Jensen-Shannon and total-variation are well-defined between fibres.
Used internally by the spectrum comparison functions (
compare_two_groups(),compare_two_groups_masked(),compare_glm()) when theirnormalize_shape=Truekeyword argument is set — the differential-frequency test then fires only on shape redistribution across radial bins, not on overall amplitude changes.Companion functions:
normalize_background()removes per-sample multiplicative gain via cross-gene geometric-mean centring in log-space.normalize_covariates()removes per-bin bias linear in user-supplied covariate spectra.
Examples
>>> import numpy as np >>> x = np.array([[1.0, 2.0, 4.0], [10.0, 20.0, 40.0]]) >>> P_tilde = normalize_shape(x, axis=-1) >>> bool(np.allclose(P_tilde.sum(axis=-1), 1.0)) True >>> bool(np.allclose(P_tilde[0], P_tilde[1])) # only the shape survives True
- quadsv.comparators.multisample.radial_bin_spectrum(spectrum, grid_shape, n_bins=30, fft_solver='rfft2', exclude_dc=True, spacing=None, edges=None)[source]#
Bin a 2D power spectrum into
n_binsradial frequency bins.By default the binning axis is the normalized radial frequency \(k = \sqrt{(k_x/n_x)^2 + (k_y/n_y)^2} \in [0,\,\sqrt{0.5}]\), so spectra from samples with different
(ny, nx)map onto the same K bins. Passingspacing=(dy, dx)(in physical units, e.g. μm per cell) switches the binning axis to cycles per unit length (cycles/μm → multiply by 1000 for cycles/mm), so bins are directly comparable across samples with different physical resolutions. In that case, also passedgesto enforce a common bin grid across samples.- Parameters:
spectrum (np.ndarray) – Power spectrum of shape
(..., ny, n_kx). Leading dims (e.g., genes, samples) are preserved.grid_shape (tuple[int, int]) – Original
(ny, nx)of the rasterized image (needed becauserfft2only stores half of the kx axis).n_bins (int, default 30) – Number of radial bins. Ignored when
edgesis supplied.fft_solver ({'fft2', 'rfft2'}, default 'rfft2') – FFT solver used to produce
spectrum. Must match.exclude_dc (bool, default True) – If True, drop the zero-frequency (DC) bin from the output.
spacing (tuple[float, float], optional) – Physical spacing
(dy, dx)per grid cell (e.g., μm). If given, the binning axis is physical frequency in cycles per unit length.edges (np.ndarray, optional) – Explicit monotonically increasing bin edges (length
n_bins + 1) in the same frequency units asspacing(or normalized ifspacingis None). When supplied, this overridesn_binsand gives every sample identical bin boundaries — required for cross-sample comparisons in physical units.
- Returns:
Radial spectra of shape
(..., n_bins)(orn_bins - 1whenexclude_dc=True).- Return type:
np.ndarray
- Raises:
ValueError – If
spectrum’s last two dims do not match the expected shape implied bygrid_shapeandfft_solver.