Bootstrap¶

circumplex.analysis.bootstrap ¶

Bootstrap confidence interval calculation for SSM analysis.

This module implements bootstrap resampling with confidence interval calculation, including special handling for circular displacement data.

FUNCTION	DESCRIPTION
`circular_quantile`	Calculate quantiles for circular data in radians.
`ssm_bootstrap`	Perform stratified bootstrap with confidence intervals.
`calculate_confidence_intervals`	Calculate confidence intervals from bootstrap results.

circular_quantile ¶

circular_quantile(angles: NDArray[Any, Float], probs: list[float] | NDArray[Shape[Any], Float]) -> NDArray[Shape[Any], Float]

Calculate quantiles for circular data in radians.

Implements a circular quantile method that accounts for the periodic nature of angular data. Centers angles around their mean direction, calculates linear quantiles, then transforms back.

PARAMETER	DESCRIPTION
`angles`	Array of angles in radians, shape (n,) TYPE: `NDArray[Any, Float]`
`probs`	Probability points at which to calculate quantiles (e.g., [0.025, 0.975]) TYPE: `list[float] \| NDArray[Shape[Any], Float]`

RETURNS	DESCRIPTION
`NDArray[Shape[Any], Float]`	Quantiles at the requested probability points

Examples:

>>> angles = np.array([0.1, 0.2, 6.2, 6.3])  # Two near 0, two near 2π
>>> circular_quantile(angles, [0.25, 0.75])
array([6.25..., 0.15...])

Notes

This function mirrors the quantile.circumplex_radian method from the R package (R/ssm_bootstrap.R lines 72-82). It: 1. Computes mean direction using atan2 2. Centers all angles around the mean 3. Calculates linear quantiles on centered data 4. Transforms back to [0, 2π)

Source code in src/circumplex/analysis/bootstrap.py

def circular_quantile(
    angles: NDArray[Any, Float],
    probs: list[float] | NDArray[Shape[Any], Float],
) -> NDArray[Shape[Any], Float]:
    """Calculate quantiles for circular data in radians.

    Implements a circular quantile method that accounts for the periodic
    nature of angular data. Centers angles around their mean direction,
    calculates linear quantiles, then transforms back.

    Parameters
    ----------
    angles
        Array of angles in radians, shape (n,)
    probs
        Probability points at which to calculate quantiles (e.g., [0.025, 0.975])

    Returns
    -------
    :
        Quantiles at the requested probability points

    Examples
    --------
    >>> angles = np.array([0.1, 0.2, 6.2, 6.3])  # Two near 0, two near 2π
    >>> circular_quantile(angles, [0.25, 0.75])
    array([6.25..., 0.15...])

    Notes
    -----
    This function mirrors the quantile.circumplex_radian method from the
    R package (R/ssm_bootstrap.R lines 72-82). It:
    1. Computes mean direction using atan2
    2. Centers all angles around the mean
    3. Calculates linear quantiles on centered data
    4. Transforms back to [0, 2π)

    """
    # Calculate mean direction
    mean_angle = np.arctan2(np.mean(np.sin(angles)), np.mean(np.cos(angles)))

    # Center angles around mean direction
    angles_centered = (angles - mean_angle + np.pi) % (2 * np.pi) - np.pi

    # Calculate quantiles on centered data
    quantiles_centered = np.quantile(angles_centered, probs)

    # Transform back
    return (quantiles_centered + mean_angle) % (2 * np.pi)

ssm_bootstrap ¶

ssm_bootstrap(data: DataFrame, bootstrap_fn: Callable[[DataFrame, NDArray[Shape[Any], Float]], NDArray[Shape[Any], Float]], boots: int = 2000, grouping_col: str | None = None, *, seed: int | None = None) -> dict[str, Any]

Perform stratified bootstrap with confidence intervals.

Executes bootstrap resampling with stratification by group (if specified), calculates point estimates and and bootstrap replicats; use calculate_confidence_intervals() to derive confidence intervals.

PARAMETER	DESCRIPTION
`data`	DataFrame containing all data for bootstrap sampling TYPE: `DataFrame`
`bootstrap_fn`	Function that takes (data, resample_indices) and returns flat array of parameters for all groups/measures TYPE: `Callable[[DataFrame, NDArray[Shape[Any], Float]], NDArray[Shape[Any], Float]]`
`boots`	Number of bootstrap resamples TYPE: `int` DEFAULT: `2000`
`grouping_col`	Name of grouping column for stratified sampling. If None, uses simple random sampling. TYPE: `str \| None` DEFAULT: `None`
`seed`	Random seed for reproducibility TYPE: `int \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict[str, Any]`	Dictionary containing: `t0`: Point estimates (observed parameters) `t`: Bootstrap matrix (boots x n_params) `n_params`: Number of parameters per profile `n_profiles`: Number of profiles (groups/measures)

Examples:

>>> def simple_mean(df, indices):
...     return np.array([df.iloc[indices]['x'].mean()])
>>> data = pd.DataFrame({'x': [1, 2, 3, 4, 5]})
>>> result = ssm_bootstrap(data, simple_mean, boots=100, seed=123)
>>> result['t0']  # Observed mean
array([3.])

Notes

This function mirrors ssm_bootstrap() from the R package (R/ssm_bootstrap.R lines 1-55). It uses stratified sampling when a grouping variable is provided to ensure each bootstrap sample maintains the original group proportions.

Source code in src/circumplex/analysis/bootstrap.py

def ssm_bootstrap(
    data: pd.DataFrame,
    bootstrap_fn: Callable[
        [pd.DataFrame, NDArray[Shape[Any], Float]], NDArray[Shape[Any], Float]
    ],
    boots: int = 2000,
    grouping_col: str | None = None,
    *,
    seed: int | None = None,
) -> dict[str, Any]:
    """Perform stratified bootstrap with confidence intervals.

    Executes bootstrap resampling with stratification by group (if specified),
    calculates point estimates and and bootstrap replicats;
    use `calculate_confidence_intervals()` to derive confidence intervals.

    Parameters
    ----------
    data
        DataFrame containing all data for bootstrap sampling
    bootstrap_fn
        Function that takes (data, resample_indices) and returns
        flat array of parameters for all groups/measures
    boots
        Number of bootstrap resamples
    grouping_col
        Name of grouping column for stratified sampling.
        If None, uses simple random sampling.
    seed
        Random seed for reproducibility

    Returns
    -------
    :
        Dictionary containing:

        - `t0`: Point estimates (observed parameters)
        - `t`: Bootstrap matrix (boots x n_params)
        - `n_params`: Number of parameters per profile
        - `n_profiles`: Number of profiles (groups/measures)

    Examples
    --------
    >>> def simple_mean(df, indices):
    ...     return np.array([df.iloc[indices]['x'].mean()])
    >>> data = pd.DataFrame({'x': [1, 2, 3, 4, 5]})
    >>> result = ssm_bootstrap(data, simple_mean, boots=100, seed=123)
    >>> result['t0']  # Observed mean
    array([3.])

    Notes
    -----
    This function mirrors ssm_bootstrap() from the R package
    (R/ssm_bootstrap.R lines 1-55). It uses stratified sampling
    when a grouping variable is provided to ensure each bootstrap
    sample maintains the original group proportions.

    """
    if seed is not None:
        np.random.seed(seed)

    n_obs = len(data)

    # Calculate observed parameters (t0)
    observed_indices = np.arange(n_obs)
    t0 = bootstrap_fn(data, observed_indices)

    # Initialize bootstrap matrix
    n_params_total = len(t0)
    t_matrix = np.zeros((boots, n_params_total))

    # Perform bootstrap resampling
    for b in range(boots):
        if grouping_col is not None:
            # Stratified sampling: sample within each group
            resample_indices = _stratified_resample(data, grouping_col)
        else:
            # Simple random sampling with replacement
            resample_indices = np.random.choice(n_obs, size=n_obs, replace=True)

        # Calculate parameters for this resample
        t_matrix[b] = bootstrap_fn(data, resample_indices)

    return {
        "t0": t0,
        "t": t_matrix,
        "n_params": 6,  # Always 6 SSM parameters per profile
        "n_profiles": n_params_total // 6,
    }

calculate_confidence_intervals ¶

calculate_confidence_intervals(bootstrap_results: dict[str, Any], interval: float = 0.95) -> pd.DataFrame

Calculate confidence intervals from bootstrap results.

Computes percentile confidence intervals for all parameters, with special circular handling for displacement parameters.

PARAMETER	DESCRIPTION
`bootstrap_results`	Dictionary from ssm_bootstrap() containing `t0` and `t` TYPE: `dict[str, Any]`
`interval`	Confidence level (e.g., 0.95 for 95% CI) TYPE: `float` DEFAULT: `0.95`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame with columns: e_est, x_est, y_est, a_est, d_est, fit_est (point estimates) e_lci, x_lci, y_lci, a_lci, d_lci (lower CI) e_uci, x_uci, y_uci, a_uci, d_uci (upper CI) Note: fit has no confidence intervals

Notes

This function mirrors the CI calculation in R's ssm_bootstrap() (R/ssm_bootstrap.R lines 38-54). It uses percentile method for linear parameters and circular_quantile() for displacement.

Source code in src/circumplex/analysis/bootstrap.py

def calculate_confidence_intervals(
    bootstrap_results: dict[str, Any],
    interval: float = 0.95,
) -> pd.DataFrame:
    """Calculate confidence intervals from bootstrap results.

    Computes percentile confidence intervals for all parameters, with
    special circular handling for displacement parameters.

    Parameters
    ----------
    bootstrap_results
        Dictionary from ssm_bootstrap() containing `t0` and `t`
    interval
        Confidence level (e.g., 0.95 for 95% CI)

    Returns
    -------
    :
        DataFrame with columns:

        - e_est, x_est, y_est, a_est, d_est, fit_est (point estimates)
        - e_lci, x_lci, y_lci, a_lci, d_lci (lower CI)
        - e_uci, x_uci, y_uci, a_uci, d_uci (upper CI)
        Note: fit has no confidence intervals

    Notes
    -----
    This function mirrors the CI calculation in R's ssm_bootstrap()
    (R/ssm_bootstrap.R lines 38-54). It uses percentile method for
    linear parameters and circular_quantile() for displacement.

    """
    t0 = bootstrap_results["t0"]
    t_matrix = bootstrap_results["t"]
    n_params = bootstrap_results["n_params"]
    n_profiles = bootstrap_results["n_profiles"]

    # Calculate probability points for CI
    alpha = 1 - interval
    lower_prob = alpha / 2
    upper_prob = 1 - alpha / 2

    # Initialize results DataFrame
    param_names = ["e", "x", "y", "a", "d", "fit"]
    results = []

    for profile_idx in range(n_profiles):
        # Extract parameters for this profile
        param_start = profile_idx * n_params
        profile_params = {}

        for param_idx, param_name in enumerate(param_names):
            obs_value = t0[param_start + param_idx]
            boot_values = t_matrix[:, param_start + param_idx]

            # Point estimate
            profile_params[f"{param_name}_est"] = obs_value

            # Confidence intervals (skip fit)
            if param_name != "fit":
                if param_name == "d":
                    # Use circular quantile for displacement
                    ci = circular_quantile(boot_values, [lower_prob, upper_prob])

                else:
                    # Use regular quantile for other parameters
                    ci = np.quantile(boot_values, [lower_prob, upper_prob])

                profile_params[f"{param_name}_lci"] = ci[0]  # type: ignore[non-subscriptable]
                profile_params[f"{param_name}_uci"] = ci[1]  # type: ignore[non-subscriptable]

        results.append(profile_params)

    return pd.DataFrame(results)