Skip to content

SATP

soundscapy.satp

Soundscape Attributes Translation (SATP) calculation module.

This module provides functions and classes for conducting the SATP analysis, based on the R implementation. Requires optional dependencies.

MODULE DESCRIPTION
circe

Circumplex SEM Analysis for Soundscape Attributes Translation Project (SATP).

CircE

soundscapy.satp.circe

Circumplex SEM Analysis for Soundscape Attributes Translation Project (SATP).

This module provides tools for analyzing soundscape perception data using circumplex Structural Equation Modeling (SEM). It includes data validation schemas, model fitting classes, and analysis workflows for the Soundscape Attributes Translation Project.

The module supports various circumplex model types (unconstrained, equal angles, equal communalities, and full circumplex) and provides automated data preprocessing including within-person centering (column-wise centering per participant).

FUNCTION DESCRIPTION
normalize_polar_angles

Correct reflected polar-angle solutions to canonical orientation

person_center

Column-wise within-participant centering of PAQ ratings

fit_circe

Fit circumplex SEM models and return a tidy DataFrame

CLASS DESCRIPTION
CircModelE : Enum

Enumeration of available circumplex model types

SATPSchema : DataFrameModel

Pandera schema for validating SATP data format

CircE : dataclass

Results container for a fitted circumplex model

CircModelE

Bases: StrEnum

Enumeration of circumplex model types.

ATTRIBUTE DESCRIPTION
equal_ang

Whether this model constrains all angles to be equally spaced.

TYPE: bool

equal_com

Whether this model constrains all communalities to be equal.

TYPE: bool

equal_ang property
equal_ang: bool

Whether this model constrains all angles to be equally spaced.

True for EQUAL_ANG and CIRCUMPLEX; False for UNCONSTRAINED and EQUAL_COM.

equal_com property
equal_com: bool

Whether this model constrains all communalities to be equal.

True for EQUAL_COM and CIRCUMPLEX; False for UNCONSTRAINED and EQUAL_ANG.

SATPSchema

Bases: DataFrameModel

Pandera schema for validating SATP (Soundscape Attributes Translation Project) data.

This schema validates DataFrame columns containing PAQ ratings and participant identifiers. PAQ ratings must be between 0 and 100.

CLASS DESCRIPTION
Config

Configuration for the schema validation behavior.

METHOD DESCRIPTION
column_alias

Parse and rename DataFrame columns to match the schema.

Config

Configuration for the schema validation behavior.

column_alias
column_alias(df: DataFrame) -> DataFrame

Parse and rename DataFrame columns to match the schema.

Uses _COLUMN_ALIASES (module-level constant) for a single-pass case-insensitive lookup. Handles PAQ label names (e.g. "pleasant" or "Pleasant""PAQ1"), PAQ IDs in any case ("paq1""PAQ1"), and the participant field in any capitalisation ("PARTICIPANT""participant").

PARAMETER DESCRIPTION
df

Input DataFrame to rename columns for

TYPE: DataFrame

RETURNS DESCRIPTION
DataFrame
Source code in src/soundscapy/satp/circe.py
@pa.dataframe_parser
def column_alias(cls, df: DataFrame) -> DataFrame:  # noqa: N805
    """
    Parse and rename DataFrame columns to match the schema.

    Uses ``_COLUMN_ALIASES`` (module-level constant) for a single-pass
    case-insensitive lookup.  Handles PAQ label names (e.g. ``"pleasant"``
    or ``"Pleasant"`` → ``"PAQ1"``), PAQ IDs in any case (``"paq1"`` →
    ``"PAQ1"``), and the participant field in any capitalisation
    (``"PARTICIPANT"`` → ``"participant"``).

    Parameters
    ----------
    df
        Input DataFrame to rename columns for

    Returns
    -------
    :

    """
    rename_dict = {
        col: _COLUMN_ALIASES[col.lower()]
        for col in df.columns
        if col.lower() in _COLUMN_ALIASES
    }
    return df.rename(columns=rename_dict)

CircE dataclass

CircE(
    model: CircModelE,
    datasource: str,
    language: str,
    n: int,
    m: int | None,
    chisq: float | None,
    d: int | None,
    p: float | None,
    cfi: float | None,
    gfi: float | None,
    agfi: float | None,
    srmr: float | None,
    mcsc: float | None,
    rmsea: float | None,
    rmsea_l: float | None,
    rmsea_u: float | None,
    polar_angles: Series | None = None,
)

Results container for a fitted CircE (circumplex SEM) model.

ATTRIBUTE DESCRIPTION
model

The circumplex model type that was fitted.

TYPE: CircModelE

datasource

Source identifier for the dataset.

TYPE: str

language

Language code for the dataset.

TYPE: str

n

Number of observations (complete cases) used to fit the model.

TYPE: int

m

Number of common factors.

TYPE: int | None

chisq

Chi-squared fit statistic.

TYPE: float | None

d

Model degrees of freedom.

TYPE: int | None

p

p-value for the chi-squared statistic.

TYPE: float | None

cfi

Comparative Fit Index.

TYPE: float | None

gfi

Goodness of Fit Index.

TYPE: float | None

agfi

Adjusted Goodness of Fit Index.

TYPE: float | None

srmr

Standardised Root Mean Square Residual.

TYPE: float | None

mcsc

Mean Communality Squared Cosines.

TYPE: float | None

rmsea

Root Mean Square Error of Approximation.

TYPE: float | None

rmsea_l

Lower bound of the 90% confidence interval for RMSEA.

TYPE: float | None

rmsea_u

Upper bound of the 90% confidence interval for RMSEA.

TYPE: float | None

polar_angles

Estimated polar angles (degrees) for each PAQ item, with PAQ_IDS as the index. Only available for models with free angle parameters (UNCONSTRAINED, EQUAL_COM). None for EQUAL_ANG and CIRCUMPLEX.

TYPE: Series | None

METHOD DESCRIPTION
from_bfgs

Create a CircE instance from BFGS fit output.

compute_bfgs_fit

Compute and return a CircE from the given correlation matrix.

to_dict

Return all model fit statistics as a flat dictionary.

gdiff property
gdiff: float | None

RMSD between fitted polar angles and ideal circumplex spacing.

Measures how closely the unconstrained angle estimates match perfect 45°-spaced circumplex positions. Only defined for models with free angles (UNCONSTRAINED, EQUAL_COM); returns None for EQUAL_ANG and CIRCUMPLEX (where polar_angles is None).

A smaller value indicates better agreement with circumplex structure.

RETURNS DESCRIPTION
float | None

Rounded RMSD value (2 decimal places), or None if angles are fixed by the model.

from_bfgs classmethod
from_bfgs(
    fit_stats: Mapping[str, Any] | ListVector,
    datasource: str,
    language: str,
    circ_model: CircModelE,
    n: int,
) -> CircE

Create a CircE instance from BFGS fit output.

Deprecated v0.8.4

Passing an rpy2 ListVector as fit_stats is deprecated and will be removed in a future release. Call :func:soundscapy.r_wrapper.bfgs_fit to obtain a dict and pass that instead.

PARAMETER DESCRIPTION
fit_stats

Either a pre-extracted dict of fit statistics (new path, as returned by :func:soundscapy.r_wrapper.bfgs_fit), or the raw rpy2 ListVector returned by the embedded CircE.BFGS R function (deprecated — pass a dict instead).

TYPE: Mapping[str, Any] | ListVector

datasource

Source identifier for the dataset.

TYPE: str

language

Language code for the dataset.

TYPE: str

circ_model

Circumplex model type that was fitted.

TYPE: CircModelE

n

Number of observations used to compute the correlation matrix.

TYPE: int

Source code in src/soundscapy/satp/circe.py
@classmethod
def from_bfgs(
    cls,
    fit_stats: "Mapping[str, Any] | rpy2.robjects.ListVector",
    datasource: str,
    language: str,
    circ_model: CircModelE,
    n: int,
) -> "CircE":
    """Create a CircE instance from BFGS fit output.

    !!! warning "Deprecated v0.8.4"
        Passing an rpy2 ``ListVector`` as *fit_stats* is deprecated and
        will be removed in a future release.
        Call :func:`soundscapy.r_wrapper.bfgs_fit` to obtain a dict and
        pass that instead.

    Parameters
    ----------
    fit_stats
        Either a pre-extracted ``dict`` of fit statistics (new path, as
        returned by :func:`soundscapy.r_wrapper.bfgs_fit`), or the raw
        rpy2 ``ListVector`` returned by the embedded ``CircE.BFGS`` R
        function (deprecated — pass a dict instead).
    datasource
        Source identifier for the dataset.
    language
        Language code for the dataset.
    circ_model
        Circumplex model type that was fitted.
    n
        Number of observations used to compute the correlation matrix.

    """
    if not isinstance(fit_stats, Mapping):
        warnings.warn(
            "Passing an rpy2 model object to CircE.from_bfgs() is deprecated. "
            "Use soundscapy.r_wrapper.bfgs_fit() to obtain a fit-stats dict "
            "and pass that instead.",
            DeprecationWarning,
            stacklevel=2,
        )
        from soundscapy.r_wrapper._r_wrapper import _extract_bfgs_stats  # noqa: PLC0415

        fit_stats = _extract_bfgs_stats(fit_stats)

    polar_angles = None
    # Only extract polar angles for models where angles are free parameters.
    # The R key is "polar.angles" (dot), not "polar_angles" (underscore).
    if circ_model in (CircModelE.UNCONSTRAINED, CircModelE.EQUAL_COM):
        raw_pa = fit_stats.get("polar.angles")
        if raw_pa is not None:
            # raw_pa is a DataFrame with index=PAQ_IDS and columns from
            # CircE_BFGS: ["estimates", "(L;", "U)"].  Use the label
            # "estimates" directly; fall back to first column if the
            # CircE API ever changes its output names.
            pa_df = pd.DataFrame(raw_pa)
            if "estimates" in pa_df.columns:
                estimates = pa_df["estimates"].to_numpy()
            else:
                estimates = pa_df.iloc[:, 0].to_numpy()
            polar_angles = normalize_polar_angles(
                pd.Series(estimates, index=PAQ_IDS)
            )

    return cls(
        model=circ_model,
        datasource=datasource,
        language=language,
        n=n,
        m=fit_stats.get("m", None),
        chisq=fit_stats.get("chisq", None),
        d=fit_stats.get("d", None),
        p=fit_stats.get("p", None),
        cfi=fit_stats.get("cfi", None),
        gfi=fit_stats.get("gfi", None),
        agfi=fit_stats.get("agfi", None),
        srmr=fit_stats.get("srmr", None),
        mcsc=fit_stats.get("mcsc", None),
        rmsea=fit_stats.get("rmsea", None),
        rmsea_l=fit_stats.get("rmsea.l", None),
        rmsea_u=fit_stats.get("rmsea.u", None),
        polar_angles=polar_angles,
    )
compute_bfgs_fit classmethod
compute_bfgs_fit(
    data_cor: DataFrame,
    n: int,
    datasource: str,
    language: str,
    circ_model: CircModelE,
) -> CircE

Compute and return a CircE from the given correlation matrix.

PARAMETER DESCRIPTION
data_cor

Correlation matrix of the PAQ data (8x8).

TYPE: DataFrame

n

Number of observations used to compute data_cor. This is used by CircE_BFGS for chi-square and RMSEA calculations and must be the row count of the complete-case data.

TYPE: int

datasource

Source identifier for the dataset.

TYPE: str

language

Language code for the dataset.

TYPE: str

circ_model

Circumplex model type to fit.

TYPE: CircModelE

Examples:

>>> import soundscapy as sspy
>>> data = sspy.isd.load()
>>> data_paqs = data[PAQ_IDS]
>>> data_paqs = data_paqs.dropna()
>>> data_cor = data_paqs.corr()
>>> n = len(data_paqs)
>>> circ_model = sspy.satp.CircModelE.CIRCUMPLEX
>>> circe_res = sspy.satp.CircE.compute_bfgs_fit(
... data_cor, n, "ISD", "EN", circ_model)
...
Source code in src/soundscapy/satp/circe.py
@classmethod
def compute_bfgs_fit(
    cls,
    data_cor: pd.DataFrame,
    n: int,
    datasource: str,
    language: str,
    circ_model: CircModelE,
) -> "CircE":
    """
    Compute and return a CircE from the given correlation matrix.

    Parameters
    ----------
    data_cor
        Correlation matrix of the PAQ data (8x8).
    n
        Number of observations used to compute ``data_cor``.
        This is used by ``CircE_BFGS`` for chi-square and RMSEA calculations
        and must be the row count of the *complete-case* data.
    datasource
        Source identifier for the dataset.
    language
        Language code for the dataset.
    circ_model
        Circumplex model type to fit.

    Examples
    --------
    >>> import soundscapy as sspy
    >>> data = sspy.isd.load()
    >>> data_paqs = data[PAQ_IDS]
    >>> data_paqs = data_paqs.dropna()
    >>> data_cor = data_paqs.corr()
    >>> n = len(data_paqs)
    >>> circ_model = sspy.satp.CircModelE.CIRCUMPLEX
    >>> circe_res = sspy.satp.CircE.compute_bfgs_fit(
    ... data_cor, n, "ISD", "EN", circ_model)
    ...

    """
    fit_stats = sspyr.bfgs_fit(
        data_cor=data_cor,
        n=n,
        scales=PAQ_IDS,
        m_val=3,
        equal_ang=circ_model.equal_ang,
        equal_com=circ_model.equal_com,
    )
    return cls.from_bfgs(fit_stats, datasource, language, circ_model, n)
to_dict
to_dict() -> dict[str, Any]

Return all model fit statistics as a flat dictionary.

Polar angle columns (PAQ1-PAQ8) are expanded as individual keys. For models with fixed angles (EQUAL_ANG, CIRCUMPLEX), PAQ values are None.

RETURNS DESCRIPTION
dict[str, Any]

Flat dictionary suitable for constructing a pandas DataFrame row.

Source code in src/soundscapy/satp/circe.py
def to_dict(self) -> dict[str, Any]:
    """
    Return all model fit statistics as a flat dictionary.

    Polar angle columns (PAQ1-PAQ8) are expanded as individual keys.
    For models with fixed angles (EQUAL_ANG, CIRCUMPLEX), PAQ values
    are ``None``.

    Returns
    -------
    :
        Flat dictionary suitable for constructing a pandas DataFrame row.

    """
    base = {
        "datasource": self.datasource,
        "language": self.language,
        "model": self.model.value,
        "n": self.n,
        "m": self.m,
        "chisq": self.chisq,
        "d": self.d,
        "p": self.p,
        "cfi": self.cfi,
        "gfi": self.gfi,
        "agfi": self.agfi,
        "srmr": self.srmr,
        "mcsc": self.mcsc,
        "rmsea": self.rmsea,
        "rmsea_l": self.rmsea_l,
        "rmsea_u": self.rmsea_u,
        "gdiff": self.gdiff,
    }
    if self.polar_angles is not None:
        base.update(self.polar_angles.to_dict())  # type: ignore [no-matching-overload]
    else:
        base.update(dict.fromkeys(PAQ_IDS))
    return base

CircEResults dataclass

CircEResults(
    models: list[CircE],
    language: str,
    datasource: str,
    error_rows: list[dict] = list(),
)

Collection of fitted CircE models returned by fit_circe.

Holds both successfully-fitted CircE instances and any error rows from models that failed to converge. Access the full tidy DataFrame via table; access individual model results via for_model.

ATTRIBUTE DESCRIPTION
models

Successfully-fitted CircE results, in fitting order.

TYPE: list[CircE]

language

Language code passed to fit_circe.

TYPE: str

datasource

Dataset identifier passed to fit_circe.

TYPE: str

error_rows

Dicts for model runs that raised an exception during fitting. Each dict contains language, datasource, model, n, and an error key with the exception message.

TYPE: list[dict]

METHOD DESCRIPTION
__len__

Total number of model runs (successful + failed).

for_model

Return the fitted CircE result for a specific model type.

table property
table: DataFrame

Full tidy DataFrame of all model fit statistics.

One row per model (including error rows). Columns match those described in fit_circe. Integer columns (n, d, m) use pandas nullable Int64 dtype so that None in error rows does not promote the whole column to float64.

__len__
__len__() -> int

Total number of model runs (successful + failed).

Source code in src/soundscapy/satp/circe.py
def __len__(self) -> int:
    """Total number of model runs (successful + failed)."""
    return len(self.models) + len(self.error_rows)
for_model
for_model(model: CircModelE) -> CircE

Return the fitted CircE result for a specific model type.

PARAMETER DESCRIPTION
model

The CircModelE variant to retrieve.

TYPE: CircModelE

RAISES DESCRIPTION
KeyError

If no successful result exists for the requested model (e.g. it failed to converge).

Source code in src/soundscapy/satp/circe.py
def for_model(self, model: CircModelE) -> CircE:
    """
    Return the fitted `CircE` result for a specific model type.

    Parameters
    ----------
    model
        The `CircModelE` variant to retrieve.

    Raises
    ------
    KeyError
        If no successful result exists for the requested model (e.g. it
        failed to converge).

    """
    for m in self.models:
        if m.model is model:
            return m
    msg = f"No successful result for model {model.value!r}"
    raise KeyError(msg)

normalize_polar_angles

normalize_polar_angles(angles: Series) -> pd.Series

Return polar angles in canonical (counter-clockwise) orientation.

CircE's BFGS optimisation may converge to a mathematically equivalent reflected solution in which the PAQ attributes are arranged in clockwise (decreasing) order rather than the canonical counter-clockwise (increasing) order. Both solutions fit the correlation data equally well, but the reflected form is inconsistent with the standard circumplex ordering (pleasant → vibrant → eventful → …) and will produce incorrect GDIFF values if compared against the ideal equally-spaced angles.

Detection uses a monotonicity check on the first three angles after PAQ1: if the angle for PAQ2 exceeds PAQ3, or PAQ3 exceeds PAQ4, the solution is reflected. This is more robust than a threshold-on-sum heuristic because it tests the structural property of the orientation directly.

When reflection is detected, 360 - angle is applied to PAQ2-PAQ8. PAQ1 is anchored at 0° and is left unchanged.

PARAMETER DESCRIPTION
angles

Series of polar angle estimates (degrees) with PAQ_IDS as the index. Typically the polar_angles attribute of a CircE instance.

TYPE: Series

RETURNS DESCRIPTION
Series

Polar angles in canonical (counter-clockwise) orientation, with the same index as the input.

Examples:

>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> import pandas as pd
>>> reflected = pd.Series(
>>>     [0.0, 315.0, 270.0, 225.0, 180.0, 135.0, 90.0, 45.0],
>>>     index=PAQ_IDS)
>>> normalize_polar_angles(reflected).tolist()
[0.0, 45.0, 90.0, 135.0, 180.0, 225.0, 270.0, 315.0]
Source code in src/soundscapy/satp/circe.py
def normalize_polar_angles(angles: pd.Series) -> pd.Series:
    """
    Return polar angles in canonical (counter-clockwise) orientation.

    CircE's BFGS optimisation may converge to a mathematically equivalent
    *reflected* solution in which the PAQ attributes are arranged in clockwise
    (decreasing) order rather than the canonical counter-clockwise (increasing)
    order.  Both solutions fit the correlation data equally well, but the
    reflected form is inconsistent with the standard circumplex ordering
    (pleasant → vibrant → eventful → …) and will produce incorrect GDIFF values
    if compared against the ideal equally-spaced angles.

    Detection uses a monotonicity check on the first three angles after PAQ1:
    if the angle for PAQ2 exceeds PAQ3, or PAQ3 exceeds PAQ4, the solution is
    reflected.  This is more robust than a threshold-on-sum heuristic because
    it tests the structural property of the orientation directly.

    When reflection is detected, ``360 - angle`` is applied to PAQ2-PAQ8.
    PAQ1 is anchored at 0° and is left unchanged.

    Parameters
    ----------
    angles
        Series of polar angle estimates (degrees) with PAQ_IDS as the index.
        Typically the ``polar_angles`` attribute of a `CircE` instance.

    Returns
    -------
    :
        Polar angles in canonical (counter-clockwise) orientation, with the
        same index as the input.

    Examples
    --------
    >>> from soundscapy.surveys.survey_utils import PAQ_IDS
    >>> import pandas as pd
    >>> reflected = pd.Series(
    >>>     [0.0, 315.0, 270.0, 225.0, 180.0, 135.0, 90.0, 45.0],
    >>>     index=PAQ_IDS)
    >>> normalize_polar_angles(reflected).tolist()
    [0.0, 45.0, 90.0, 135.0, 180.0, 225.0, 270.0, 315.0]

    """
    # Monotonicity check (positional): canonical ordering has PAQ2 < PAQ3 < PAQ4.
    if angles.iloc[1] > angles.iloc[2] or angles.iloc[2] > angles.iloc[3]:
        corrected = angles.copy()
        corrected.iloc[1:] = 360 - corrected.iloc[1:]
        return corrected
    return angles

person_center

person_center(
    data: DataFrame, by: str = "participant"
) -> pd.DataFrame

Center PAQ ratings within each participant (column-wise within-person centering).

Deprecated v0.8.0

Use soundscapy.surveys.ipsatize with method="column_wise" instead. For the centering that matches the published SATP analysis, use method="grand_mean" (the default of ~soundscapy.surveys.ipsatize).

This function applies column-wise centering: for every PAQ column independently, each participant's mean across their observations is subtracted (8 centering scalars per participant).

Note

This is not the centering described in the original SATP R implementation (Aletta et al., 2024), which applies grand-mean centering (one scalar per participant across all PAQ columns and observations). Use soundscapy.surveys.ipsatize with method="grand_mean" to match the R reference implementation.

PARAMETER DESCRIPTION
data

DataFrame containing PAQ columns and a participant grouping column.

TYPE: DataFrame

by

Column to group by for centering. Default is "participant".

TYPE: str DEFAULT: 'participant'

RETURNS DESCRIPTION
DataFrame

DataFrame containing only the PAQ columns (not by), with column-wise participant-centred values.

Source code in src/soundscapy/satp/circe.py
def person_center(data: pd.DataFrame, by: str = "participant") -> pd.DataFrame:
    """
    Center PAQ ratings within each participant (column-wise within-person centering).

    !!! warning "Deprecated v0.8.0"
        Use `soundscapy.surveys.ipsatize` with ``method="column_wise"``
        instead.  For the centering that matches the published SATP analysis,
        use ``method="grand_mean"`` (the default of
        `~soundscapy.surveys.ipsatize`).

    This function applies **column-wise** centering: for every PAQ column
    independently, each participant's mean across their observations is
    subtracted (8 centering scalars per participant).

    !!! note
        This is *not* the centering described in the original SATP R
        implementation (Aletta et al., 2024), which applies grand-mean
        centering (one scalar per participant across all PAQ columns and
        observations).  Use `soundscapy.surveys.ipsatize` with
        ``method="grand_mean"`` to match the R reference implementation.

    Parameters
    ----------
    data
        DataFrame containing PAQ columns and a participant grouping column.
    by
        Column to group by for centering. Default is ``"participant"``.

    Returns
    -------
    :
        DataFrame containing only the PAQ columns (not ``by``), with
        column-wise participant-centred values.

    """
    import warnings  # noqa: PLC0415

    warnings.warn(
        "person_center() is deprecated; use ipsatize(method='column_wise') instead.",
        DeprecationWarning,
        stacklevel=2,
    )
    return ipsatize(data, method="column_wise", participant_col=by)

fit_circe

fit_circe(
    data: DataFrame,
    language: str,
    datasource: str,
    *,
    models: list[CircModelE] | None = None,
    center_by_participant: bool = True,
    errors: Literal["raise", "warn"] = "raise",
) -> CircEResults

Fit circumplex SEM models to PAQ data and return a tidy DataFrame.

Validates input data, optionally applies grand-mean within-person centering (matching the published SATP analysis), computes a complete-case correlation matrix, and fits the requested circumplex model types using Browne's BFGS optimisation via the R CircE package.

PARAMETER DESCRIPTION
data

DataFrame with PAQ1-PAQ8 and a participant column. Column aliases (e.g. PAQ label names, Participant) are accepted and renamed automatically by the schema validator.

TYPE: DataFrame

language

Language code for the dataset (e.g. "eng", "fra"). Stored in the results; not used for computation.

TYPE: str

datasource

Dataset identifier (e.g. "SATP", "ISD"). Stored in the results; not used for computation.

TYPE: str

models

List of model types to fit. Default: all four CircModelE variants. Passing [] returns an empty CircEResults (len(result) == 0).

TYPE: list[CircModelE] | None DEFAULT: None

center_by_participant

Whether to apply grand-mean within-person centering (via ~soundscapy.surveys.ipsatize with method="grand_mean") before fitting. Set to False if the data is already centered or if no centering is desired.

TYPE: bool DEFAULT: True

errors

How to handle rows that fail schema validation (PAQ values outside [0, 100], missing required columns, etc.):

"raise" (default) — raise a pandera.errors.SchemaErrors immediately, listing every failing row and constraint.

"warn" — emit a UserWarning describing the failing rows and continue with the valid rows only.

Note

If you pass already-centered data, set center_by_participant=False to skip the internal centering step; otherwise pass raw [0, 100]-range data and use the default center_by_participant=True. Passing pre-centered data without disabling centering will cause schema validation to reject the negative values.

TYPE: Literal['raise', 'warn'] DEFAULT: 'raise'

RETURNS DESCRIPTION
CircEResults

Collection of fitted models. Access the tidy DataFrame via .table; access individual model results via .for_model(). Failed models are stored in .error_rows and included in .table.

Examples:

>>> import soundscapy as sspy
>>> from soundscapy.satp import fit_circe
>>> data = sspy.isd.load()
>>> data = data.rename(columns={'SessionID': 'participant'})
>>> results = fit_circe(data, language='eng', datasource='ISD', errors='warn')
>>> len(results)
4
Source code in src/soundscapy/satp/circe.py
def fit_circe(
    data: pd.DataFrame,
    language: str,
    datasource: str,
    *,
    models: list[CircModelE] | None = None,
    center_by_participant: bool = True,
    errors: Literal["raise", "warn"] = "raise",
) -> "CircEResults":
    """
    Fit circumplex SEM models to PAQ data and return a tidy DataFrame.

    Validates input data, optionally applies grand-mean within-person centering
    (matching the published SATP analysis), computes a complete-case correlation
    matrix, and fits the requested circumplex model types using Browne's BFGS
    optimisation via the R ``CircE`` package.

    Parameters
    ----------
    data
        DataFrame with PAQ1-PAQ8 and a ``participant`` column.
        Column aliases (e.g. PAQ label names, ``Participant``) are accepted
        and renamed automatically by the schema validator.
    language
        Language code for the dataset (e.g. ``"eng"``, ``"fra"``).
        Stored in the results; not used for computation.
    datasource
        Dataset identifier (e.g. ``"SATP"``, ``"ISD"``).
        Stored in the results; not used for computation.
    models
        List of model types to fit. Default: all four ``CircModelE`` variants.
        Passing ``[]`` returns an empty `CircEResults`
        (``len(result) == 0``).
    center_by_participant
        Whether to apply grand-mean within-person centering (via
        `~soundscapy.surveys.ipsatize` with ``method="grand_mean"``)
        before fitting.  Set to ``False`` if the data is already centered or
        if no centering is desired.
    errors
        How to handle rows that fail schema validation (PAQ values outside
        ``[0, 100]``, missing required columns, etc.):

        ``"raise"`` *(default)* — raise a `pandera.errors.SchemaErrors`
        immediately, listing every failing row and constraint.

        ``"warn"`` — emit a `UserWarning` describing the failing rows
        and continue with the valid rows only.

        !!! note
            If you pass *already-centered* data, set
            ``center_by_participant=False`` to skip the internal centering step;
            otherwise pass raw ``[0, 100]``-range data and use the default
            ``center_by_participant=True``.  Passing pre-centered data without
            disabling centering will cause schema validation to reject the
            negative values.

    Returns
    -------
    :
        Collection of fitted models.  Access the tidy DataFrame via
        ``.table``; access individual model results via ``.for_model()``.
        Failed models are stored in ``.error_rows`` and included in
        ``.table``.

    Examples
    --------
    >>> import soundscapy as sspy
    >>> from soundscapy.satp import fit_circe
    >>> data = sspy.isd.load()
    >>> data = data.rename(columns={'SessionID': 'participant'})
    >>> results = fit_circe(data, language='eng', datasource='ISD', errors='warn')
    >>> len(results)
    4

    """
    warnings.warn(
        "The SATP analysis module is experimental. Use with caution.",
        UserWarning,
        stacklevel=2,
    )
    if len(data) == 0:
        msg = (
            "No complete cases found: input DataFrame is empty. "
            "Check that data contains valid rows with PAQ1-PAQ8 columns."
        )
        raise ValueError(msg)

    try:
        validated = SATPSchema.validate(data, lazy=True)
    except SchemaErrors as exc:
        if errors == "raise":
            raise
        bad_idx = exc.failure_cases["index"].dropna().unique()
        clean = data.loc[~data.index.isin(bad_idx)]
        warnings.warn(
            f"Dropping {len(data) - len(clean)} rows that failed schema validation "
            f"({len(clean)} rows remain). "
            "Pass errors='raise' to raise an error instead.",
            UserWarning,
            stacklevel=2,
        )
        try:
            validated = SATPSchema.validate(clean, lazy=True)
        except SchemaErrors as exc2:
            # type: ignore [missing-argument]
            raise SchemaErrors(
                schema_errors=exc2.schema_errors,
                data=exc2.data,
            ) from exc2

    if center_by_participant and "participant" not in validated.columns:
        msg = (
            "center_by_participant=True requires a 'participant' column. "
            "Pass center_by_participant=False if your data is already centered."
        )
        raise ValueError(msg)
    processed = (
        ipsatize(validated, method="grand_mean", participant_col="participant")
        if center_by_participant
        else validated
    )

    # Use listwise deletion (complete cases only) — consistent with R's na.omit().
    complete = processed[PAQ_IDS].dropna()
    n = len(complete)
    if n == 0:
        msg = (
            "No complete cases found after validation and ipsatization. "
            "Check that PAQ1-PAQ8 are not all NaN and participant column is present."
        )
        raise ValueError(msg)
    corr = complete.corr()

    circ_models = models if models is not None else list(CircModelE)
    fitted: list[CircE] = []
    error_rows: list[dict] = []
    fit_exceptions = (ValueError, np.linalg.LinAlgError, RuntimeError, RRuntimeError)
    for model in circ_models:
        try:
            circe = CircE.compute_bfgs_fit(corr, n, datasource, language, model)
            fitted.append(circe)
        except fit_exceptions as e:
            warnings.warn(f"{model.value} raised {e}", stacklevel=2)
            # Populate all expected columns with None so that pandas does not
            # promote numeric columns (e.g. n, d) to float64 across all rows
            # when mixing sparse error dicts with full success dicts.
            error_rows.append(
                {
                    "language": language,
                    "datasource": datasource,
                    "model": model.value,
                    "n": n,
                    "m": None,
                    "chisq": None,
                    "d": None,
                    "p": None,
                    "cfi": None,
                    "gfi": None,
                    "agfi": None,
                    "srmr": None,
                    "mcsc": None,
                    "rmsea": None,
                    "rmsea_l": None,
                    "rmsea_u": None,
                    "gdiff": None,
                    **dict.fromkeys(PAQ_IDS),
                    "error": str(e),
                }
            )

    return CircEResults(
        models=fitted,
        language=language,
        datasource=datasource,
        error_rows=error_rows,
    )