quadsv.detectors.irregular
==========================

.. py:module:: quadsv.detectors.irregular


Classes
-------

.. autoapisummary::

   quadsv.detectors.irregular.DetectorIrregular


Module Contents
---------------

.. py:class:: DetectorIrregular(kernel_method = 'matern', backend = 'matrix', **kernel_params)

   Bases: :py:obj:`quadsv.detectors.base.Detector`


   Detect spatial patterns on **irregular** samples (AnnData spots / cells).

   Univariate (Q-test) and bivariate (R-test) kernel-based spatial statistics.
   Supports two backends:

   - ``backend='matrix'`` — :class:`~quadsv.MatrixKernel` (dense or implicit
     sparse-precision, auto-selected by ``n``). Good up to ~10⁴ spots.
   - ``backend='nufft'`` — :class:`~quadsv.NUFFTKernel`, ``O(n log n)`` quadratic
     forms on arbitrary point sets. Recommended for ≥ 10⁴ spots.

   The core test statistics are:

   - Univariate:  :math:`Q = \\mathbf{x}^T \\mathbf{K} \\mathbf{x}`
   - Bivariate:  :math:`R = \\mathbf{x}^T \\mathbf{K} \\mathbf{y}`

   Workflow
   --------
   1. **Construct** with kernel method + backend + kernel hyperparameters.
   2. **Setup** with :meth:`setup_data` passing the :class:`anndata.AnnData`
      plus spatial source (``obsm_key`` in ``obsm``, or
      ``obsp_key`` for precomputed adjacency / distance).
   3. **Compute** with :meth:`compute_qstat` / :meth:`compute_rstat`.

   :param kernel_method: One of ``'gaussian'``, ``'matern'``, ``'moran'``, ``'graph_laplacian'``,
                         ``'car'``.
   :type kernel_method: str, default ``'matern'``
   :param backend: Kernel backend.
   :type backend: {``'matrix'``, ``'nufft'``}, default ``'matrix'``
   :param \*\*kernel_params: Method- and backend-specific kernel hyperparameters. Matrix backend:
                             ``bandwidth``, ``nu``, ``rho``, ``k_neighbors``, ``standardize``.
                             NUFFT backend: ``bandwidth``, ``nu``, ``rho``, ``neighbor_degree``,
                             plus grid controls ``grid_shape``, ``spacing``, ``unit_scale``,
                             ``oversample``, ``eps``.

   :ivar backend\_: Which backend was selected at construction.
   :vartype backend\_: {``'matrix'``, ``'nufft'``}
   :ivar adata: Input container set by :meth:`setup_data`.
   :vartype adata: :class:`anndata.AnnData` or None
   :ivar min_cells: Minimum non-zero count per feature; set by :meth:`setup_data`.
   :vartype min_cells: int or None
   :ivar kernel\_: The built kernel; populated by :meth:`setup_data`.
   :vartype kernel\_: :class:`~quadsv.kernels.Kernel` or None
   :ivar kernel_method\_, kernel_params\_, n: See :class:`Detector`.


   .. rubric:: Examples

   >>> import anndata as ad, numpy as np
   >>> from quadsv import DetectorIrregular
   >>> rng = np.random.default_rng(0)
   >>> adata = ad.AnnData(X=rng.standard_normal((200, 5)))
   >>> adata.obsm["spatial"] = rng.standard_normal((200, 2))
   >>> det = DetectorIrregular(kernel_method="car", rho=0.9, k_neighbors=8)
   >>> det.setup_data(adata, min_cells=5)  # doctest: +ELLIPSIS
   <DetectorIrregular ...>
   >>> # q = det.compute_qstat()


   .. py:method:: compute_qstat(source = 'var', features = None, n_jobs = -1, layer = None, return_pval = True, chunk_size = 'auto', show_progress = True)

      Compute univariate spatial Q-statistic for selected features.

      Tests each feature for significant spatial clustering or dispersion using the
      pre-built kernel. Parallelizes across features and applies Benjamini-Hochberg
      multiple testing correction.

      :param source: Feature source: 'var' (genes) or 'obs' (metadata columns).
      :type source: str, default 'var'
      :param features: Feature names to test. If None, tests all features in source.
      :type features: Optional[List[str]]
      :param n_jobs: Number of parallel jobs. -1 uses all available cores; 1 for sequential.
      :type n_jobs: int, default -1
      :param layer: If source='var', which layer to use (e.g., 'raw', 'log1p'). If None, uses .X.
      :type layer: Optional[str]
      :param return_pval: If True, returns p-values and BH-corrected p-values. If False, returns Q only.
      :type return_pval: bool, default True
      :param chunk_size: Number of features each worker densifies at once (inner batch). ``'auto'``
                         targets ~256 MB per batch using :meth:`_auto_chunk_size`, yielding
                         ``chunk_size ≈ clip(16, 512, 256 MB / (4 · n · 8 B))``. Override with an
                         integer when memory is tight or you want deterministic batching.
      :type chunk_size: int or ``'auto'``, default ``'auto'``
      :param show_progress: Show a tqdm progress bar over worker chunks.
      :type show_progress: bool, default True

      :returns: **df** -- Results sorted by Q (descending). Columns:
                - Feature: feature name
                - Q: test statistic (univariate spatial variability)
                - Z_score: standardized Q by null mean/std
                - P_value: tail probability under null (if return_pval=True)
                - P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)
      :rtype: pd.DataFrame

      :raises ValueError: If kernel not initialized, or source is invalid.

      .. rubric:: Notes

      Under H₀: feature has no spatial structure.
      Under H₁: significant spatial signal (clustering or dispersion).

      Zero-variance features are assigned Q=0, P_value=1.0.

      The null-distribution approximation is auto-selected from
      ``self.kernel_method_`` (``'clt'`` for Moran's I, ``'welch'`` for all other
      kernels) and cannot be overridden through this method. For full control
      over the null method (including ``'liu'``), call
      :func:`quadsv.statistics.spatial_q_test` directly.

      .. rubric:: Examples

      >>> detector.setup_data(adata)
      >>> results = detector.compute_qstat(source='var', features=['Gene1', 'Gene2'], n_jobs=-1)
      >>> top_genes = results.iloc[:10]


   .. py:method:: compute_rstat(features_x = None, features_y = None, source = 'var', n_jobs = -1, layer = None, return_pval = True, chunk_size = 'auto', show_progress = True)

      Compute bivariate spatial R-statistic (cross-spatial correlation) for feature pairs.

      Tests for significant spatial co-variation between pairs of features using
      the pre-built kernel. Supports symmetric (all pairs within one set) or bipartite
      (all X vs Y pairs) modes. Parallelizes computation and applies multiple testing correction.

      :param features_x: Feature names for the first set. If None and features_y is None, uses all features (symmetric mode).
      :type features_x: Optional[List[str]]
      :param features_y: Feature names for the second set. If None, computes all pairwise within features_x.
                         If provided, computes all X vs Y pairs (bipartite mode).
      :type features_y: Optional[List[str]]
      :param source: Feature source: 'var' (genes) or 'obs' (metadata columns).
      :type source: str, default 'var'
      :param n_jobs: Number of parallel jobs. -1 uses all available cores; 1 for sequential.
      :type n_jobs: int, default -1
      :param layer: If source='var', which layer to use (e.g., 'raw', 'log1p'). If None, uses .X.
      :type layer: Optional[str]
      :param return_pval: If True, returns p-values and BH-corrected p-values. If False, returns R only.
      :type return_pval: bool, default True
      :param chunk_size: Number of Y features to batch together when pre-computing ``K @ Y_chunk``.
                         ``'auto'`` uses :meth:`_auto_chunk_size` (~256 MB per batch target);
                         integer values override the heuristic.
      :type chunk_size: int or ``'auto'``, default ``'auto'``
      :param show_progress: Show a tqdm progress bar over the Y-chunk loop.
      :type show_progress: bool, default True

      :returns: **df** -- Results sorted by absolute Z_score (descending). Columns:

                - Feature_1: name of first feature
                - Feature_2: name of second feature
                - R: test statistic (bivariate spatial correlation, range approximately [-1, 1])
                - Z_score: standardized R by null mean/std
                - P_value: two-tailed p-value under null (if return_pval=True)
                - P_adj: Benjamini-Hochberg adjusted p-value (if return_pval=True)
      :rtype: pd.DataFrame

      :raises ValueError: If kernel not initialized, features_x is None when features_y is provided, or no valid pairs generated.

      .. rubric:: Notes

      Under H₀: features are spatially independent.
      Under H₁: significant spatial co-clustering or co-dispersion.

      Unlike :func:`quadsv.statistics.spatial_r_test`, this method always returns R-statistics
      for all requested feature pairs in the symmetric mode (``features_y=None``). For
      ``features_x=[A, B, C]``, the output contains
      ``(A, A), (A, B), (A, C), (B, A), (B, B), (B, C), (C, A), (C, B), (C, C)``.

      P-value calculation uses a normal approximation based on Tr(K²) and is not
      configurable through this method. For finer control over the null model,
      call :func:`quadsv.statistics.spatial_r_test` directly.

      Zero-variance features are handled gracefully (assigned R=0, P=1).

      .. rubric:: Examples

      >>> detector.setup_data(adata)
      >>> # All pairwise correlations within gene set
      >>> results = detector.compute_rstat(features_x=['Gene1', 'Gene2', 'Gene3'], n_jobs=-1)
      >>> # Cross-correlation between two gene sets
      >>> results = detector.compute_rstat(
      ...     features_x=['Gene1', 'Gene2'],
      ...     features_y=['Gene3', 'Gene4'],
      ...     n_jobs=-1
      ... )


   .. py:method:: setup_data(adata, *, obsm_key = 'spatial', obsp_key = None, is_distance = False, min_cells = 1, min_cells_frac = None)

      Attach ``adata``, apply feature filters, build the kernel.

      :param adata: Input container. Must have ``adata.obsm[obsm_key]`` (unless
                    ``obsp_key`` is provided instead).
      :type adata: :class:`anndata.AnnData`
      :param obsm_key: Key in ``adata.obsm`` holding ``(n_obs, 2)`` spatial coordinates.
                       Used when ``obsp_key`` is ``None``.
      :type obsm_key: str, default ``'spatial'``
      :param obsp_key: If provided, build the kernel from ``adata.obsp[obsp_key]``
                       instead of from coordinates. Not compatible with ``backend='nufft'``.
      :type obsp_key: str, optional
      :param is_distance: When ``obsp_key`` is given: treat the matrix as pairwise distances
                          (``True``) or adjacency / connectivity (``False``).
      :type is_distance: bool, default ``False``
      :param min_cells: Minimum number of cells with non-zero value for a feature to be
                        tested. Clamped to ``[1, n_obs]``.
      :type min_cells: int, default 1
      :param min_cells_frac: If provided, overrides ``min_cells`` with
                             ``max(1, int(min_cells_frac * n_obs))``.
      :type min_cells_frac: float, optional

      :returns: **self**
      :rtype: DetectorIrregular


   .. py:attribute:: adata
      :type:  Any | None
      :value: None


      Reference to the input :class:`anndata.AnnData`, set by :meth:`setup_data`.


   .. py:attribute:: backend_
      :type:  str
      :value: 'matrix'


      Which backend will build the kernel — ``'matrix'`` or ``'nufft'``.


   .. py:attribute:: min_cells
      :type:  int | None
      :value: None


      Minimum non-zero-count threshold applied in :meth:`setup_data`.