Databases¶

soundscapy.databases ¶

Soundscapy Databases Module.

This module handles connections to and operations on soundscape databases, primarily focused on the International Soundscape Database (ISD).

MODULE	DESCRIPTION
`araus`	Customized functions specifically for the ARAUS dataset.
`isd`	Module for handling the International Soundscape Database (ISD).
`satp`	Module for handling the Soundscape Attributes Translation Project (SATP) database.

ISD¶

soundscapy.databases.isd ¶

Module for handling the International Soundscape Database (ISD).

This module provides functions for loading, validating, and analyzing data from the International Soundscape Database. It includes utilities for data retrieval, quality checks, and basic analysis operations.

Notes

The ISD is a large-scale database of soundscape surveys and recordings collected across multiple cities. This module is designed to work with the specific structure and content of the ISD.

Examples:

>>> import soundscapy.databases.isd as isd
>>> df = isd.load()
>>> isinstance(df, pd.DataFrame)
True
>>> 'PAQ1' in df.columns
True

FUNCTION	DESCRIPTION
`load`	Load the example "ISD" csv file to a DataFrame.
`load_zenodo`	Automatically fetch and load the ISD dataset from Zenodo.
`validate`	Perform data quality checks and validate that the dataset fits the expected format.
`match_col_to_likert_scale`	Match a column in the DataFrame to the Likert scale.
`likert_categorical_from_data`	Get the Likert labels for a specific column in the DataFrame.
`select_record_ids`	Filter the dataframe by RecordID.
`select_group_ids`	Filter the dataframe by GroupID.
`select_session_ids`	Filter the dataframe by SessionID.
`select_location_ids`	Filter the dataframe by LocationID.
`describe_location`	Return a summary of the data for a specific location.
`soundscapy_describe`	Return a summary of the data grouped by a specified column.

load ¶

load(locations: list[str] | None = None) -> pd.DataFrame

Load the example "ISD" csv file to a DataFrame.

PARAMETER	DESCRIPTION
`locations`	Optional list of LocationIDs to filter the data by. If None, loads all data. TYPE: `list[str] \| None` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame containing ISD data.

Notes

This function loads the ISD data from a local CSV file included with the soundscapy package.

References

Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M., Lionello, M., & Kang, J. (2022). The International Soundscape Database: An integrated multimedia database of urban soundscape surveys -- questionnaires with acoustical and contextual information (0.2.4) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6331810

Examples:

>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load()
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True

Source code in src/soundscapy/databases/isd.py

def load(locations: list[str] | None = None) -> pd.DataFrame:
    """
    Load the example "ISD" csv file to a DataFrame.

    Parameters
    ----------
    locations
        Optional list of LocationIDs to filter the data by. If None, loads all data.

    Returns
    -------
    :
        DataFrame containing ISD data.

    Notes
    -----
    This function loads the ISD data from a local CSV file included
    with the soundscapy package.

    References
    ----------
    Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M.,
    Lionello, M., & Kang, J. (2022). The International Soundscape Database:
    An integrated multimedia database of urban soundscape surveys --
    questionnaires with acoustical and contextual information (0.2.4) [Data set].
    Zenodo. https://doi.org/10.5281/zenodo.6331810

    Examples
    --------
    >>> from soundscapy.surveys.survey_utils import PAQ_IDS
    >>> df = load()
    >>> isinstance(df, pd.DataFrame)
    True
    >>> set(PAQ_IDS).issubset(df.columns)
    True

    """
    isd_resource = resources.files("soundscapy.data").joinpath("ISD v1.0 Data.csv")
    with resources.as_file(isd_resource) as f:
        data = pd.read_csv(f)
    data = rename_paqs(data, _PAQ_ALIASES)
    logger.info("Loaded ISD data from Soundscapy's included CSV file.")

    if locations is not None:
        data = select_location_ids(data, locations)

    return data

load_zenodo ¶

load_zenodo(version: str = 'latest') -> pd.DataFrame

Automatically fetch and load the ISD dataset from Zenodo.

PARAMETER	DESCRIPTION
`version`	Version number of the dataset to fetch, by default "latest". TYPE: `str` DEFAULT: `'latest'`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame containing ISD data.

RAISES	DESCRIPTION
`ValueError`	If the specified version is not recognized.

Notes

This function fetches the ISD data directly from Zenodo, allowing access to different versions of the dataset.

Examples:

>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load_zenodo("v1.0.1")
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True

Source code in src/soundscapy/databases/isd.py

def load_zenodo(version: str = "latest") -> pd.DataFrame:
    """
    Automatically fetch and load the ISD dataset from Zenodo.

    Parameters
    ----------
    version
        Version number of the dataset to fetch, by default "latest".

    Returns
    -------
    :
        DataFrame containing ISD data.

    Raises
    ------
    ValueError
        If the specified version is not recognized.

    Notes
    -----
    This function fetches the ISD data directly from Zenodo, allowing
    access to different versions of the dataset.

    Examples
    --------
    >>> from soundscapy.surveys.survey_utils import PAQ_IDS  # doctest: +SKIP
    >>> df = load_zenodo("v1.0.1")  # doctest: +SKIP
    >>> isinstance(df, pd.DataFrame)  # doctest: +SKIP
    True
    >>> set(PAQ_IDS).issubset(df.columns)  # doctest: +SKIP
    True

    """
    version = version.lower()
    version = "v1.0.1" if version == "latest" else version

    url_mapping = {
        "v0.2.0": "https://zenodo.org/record/5578573/files/SSID%20Lockdown%20Database%20VL0.2.1.xlsx",
        "v0.2.1": "https://zenodo.org/record/5578573/files/SSID%20Lockdown%20Database%20VL0.2.1.xlsx",
        "v0.2.2": "https://zenodo.org/record/5705908/files/SSID%20Lockdown%20Database%20VL0.2.2.xlsx",
        "v0.2.3": "https://zenodo.org/record/5914762/files/SSID%20Lockdown%20Database%20VL0.2.2.xlsx",
        "v1.0.0": "https://zenodo.org/records/10639661/files/ISD%20v1.0%20Data.csv",
        "v1.0.1": "https://zenodo.org/records/10639661/files/ISD%20v1.0%20Data.csv",
    }

    if version not in url_mapping:
        msg = f"Version {version} not recognised."
        raise ValueError(msg)

    url = url_mapping[version]
    file_type = "csv" if version in ["v1.0.0", "v1.0.1"] else "excel"

    data = (
        pd.read_csv(url)
        if file_type == "csv"
        else pd.read_excel(url, engine="openpyxl")
    )
    data = rename_paqs(data, _PAQ_ALIASES)

    logger.info(f"Loaded ISD data version {version} from Zenodo")
    return data

validate ¶

validate(
    df: DataFrame,
    paq_aliases: list | dict = _PAQ_ALIASES,
    val_range: tuple[int, int] = (1, 5),
    *,
    allow_paq_na: bool = False,
) -> tuple[pd.DataFrame, pd.DataFrame | None]

Perform data quality checks and validate that the dataset fits the expected format.

PARAMETER	DESCRIPTION
`df`	ISD style dataframe, including PAQ data. TYPE: `DataFrame`
`paq_aliases`	List of PAQ names (in order) or dict of PAQ names with new names as values. TYPE: `list \| dict` DEFAULT: `_PAQ_ALIASES`
`allow_paq_na`	If True, allow NaN values in PAQ data, by default False. TYPE: `bool` DEFAULT: `False`
`val_range`	Min and max range of the PAQ response values, by default (1, 5). TYPE: `tuple[int, int]` DEFAULT: `(1, 5)`

RETURNS	DESCRIPTION
`tuple[DataFrame, DataFrame \| None]`	Tuple containing the cleaned dataframe and optionally a dataframe of excluded samples.

Notes

This function renames PAQ columns, checks PAQ data quality, and optionally removes rows with invalid or missing PAQ values.

Examples:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({
...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
... })
>>> clean_df, excl_df = validate(df, allow_paq_na=True)
>>> clean_df.shape[0]
2
>>> excl_df.shape[0]
2

Source code in src/soundscapy/databases/isd.py

def validate(
    df: pd.DataFrame,
    paq_aliases: list | dict = _PAQ_ALIASES,
    val_range: tuple[int, int] = (1, 5),
    *,
    allow_paq_na: bool = False,
) -> tuple[pd.DataFrame, pd.DataFrame | None]:
    """
    Perform data quality checks and validate that the dataset fits the expected format.

    Parameters
    ----------
    df
        ISD style dataframe, including PAQ data.
    paq_aliases
        List of PAQ names (in order) or dict of PAQ names with new names as values.
    allow_paq_na
        If True, allow NaN values in PAQ data, by default False.
    val_range
        Min and max range of the PAQ response values, by default (1, 5).

    Returns
    -------
    :
        Tuple containing the cleaned dataframe
        and optionally a dataframe of excluded samples.

    Notes
    -----
    This function renames PAQ columns, checks PAQ data quality, and optionally
    removes rows with invalid or missing PAQ values.

    Examples
    --------
    >>> import pandas as pd
    >>> import numpy as np
    >>> df = pd.DataFrame({
    ...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
    ...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
    ...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
    ... })
    >>> clean_df, excl_df = validate(df, allow_paq_na=True)
    >>> clean_df.shape[0]
    2
    >>> excl_df.shape[0]
    2

    """
    logger.info("Validating ISD data")
    data = rename_paqs(df, paq_aliases)

    invalid_indices = likert_data_quality(
        data, val_range=val_range, allow_na=allow_paq_na
    )

    if invalid_indices:
        excl_data = data.iloc[invalid_indices]
        data = data.drop(data.index[invalid_indices])
        logger.info(f"Removed {len(invalid_indices)} rows with invalid PAQ data")
    else:
        excl_data = None
        logger.info("All PAQ data passed quality checks")

    return data, excl_data

match_col_to_likert_scale ¶

match_col_to_likert_scale(col: str | None) -> Scale

Match a column in the DataFrame to the Likert scale.

PARAMETER	DESCRIPTION
`col`	Column name to match. TYPE: `str \| None`

RETURNS	DESCRIPTION
`Scale`	Likert scale object.

Source code in src/soundscapy/databases/isd.py

def match_col_to_likert_scale(col: str | None) -> Scale:  # noqa: PLR0911
    """
    Match a column in the DataFrame to the Likert scale.

    Parameters
    ----------
    col
        Column name to match.

    Returns
    -------
    :
        Likert scale object.

    """
    if col in PAQ_IDS or col in PAQ_LABELS:
        return LIKERT_SCALES.paq
    if col in ["traffic_noise", "other_noise", "human_sounds", "natural_sounds"]:
        return LIKERT_SCALES.source
    if col in ["overall_sound_environment"]:
        return LIKERT_SCALES.overall
    if col in ["appropriate"]:
        return LIKERT_SCALES.appropriate
    if col in ["perceived_loud"]:
        return LIKERT_SCALES.loud
    if col in ["visit_often"]:
        return LIKERT_SCALES.often
    if col in ["like_to_visit"]:
        return LIKERT_SCALES.visit

    msg = f"Column {col} does not match any known Likert scale."
    raise ValueError(msg)

likert_categorical_from_data ¶

likert_categorical_from_data(
    data: Series,
) -> pd.Categorical

Get the Likert labels for a specific column in the DataFrame.

PARAMETER	DESCRIPTION
`data`	Series containing the data. TYPE: `Series`

RETURNS	DESCRIPTION
`Categorical`	Series with Likert labels.

RAISES	DESCRIPTION
`ValueError`	If the column does not match any known Likert scale.

Source code in src/soundscapy/databases/isd.py

def likert_categorical_from_data(
    data: pd.Series,
) -> pd.Categorical:
    """
    Get the Likert labels for a specific column in the DataFrame.

    Parameters
    ----------
    data
        Series containing the data.

    Returns
    -------
    :
        Series with Likert labels.

    Raises
    ------
    ValueError
        If the column does not match any known Likert scale.

    """
    likert_scale = match_col_to_likert_scale(str(data.name))
    if isinstance(data, pd.Categorical):
        return data

    data = data.astype("int") - 1  # Convert to zero-based index
    codes = data.to_list()

    return pd.Categorical.from_codes(
        codes,
        dtype=CategoricalDtype(categories=likert_scale, ordered=True),
    )

select_record_ids ¶

select_record_ids(
    data: DataFrame, record_ids: str | int | list | tuple
) -> pd.DataFrame

Filter the dataframe by RecordID.

PARAMETER	DESCRIPTION
`data`	ISD dataframe. TYPE: `DataFrame`
`record_ids`	RecordID(s) to filter by. TYPE: `str \| int \| list \| tuple`

RETURNS	DESCRIPTION
`DataFrame`	Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'RecordID': ['A', 'B', 'C', 'D'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_record_ids(df, ['A', 'C'])
  RecordID  Value
0        A      1
2        C      3

Source code in src/soundscapy/databases/isd.py

def select_record_ids(
    data: pd.DataFrame, record_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by RecordID.

    Parameters
    ----------
    data
        ISD dataframe.
    record_ids
        RecordID(s) to filter by.

    Returns
    -------
    :
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'RecordID': ['A', 'B', 'C', 'D'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_record_ids(df, ['A', 'C'])
      RecordID  Value
    0        A      1
    2        C      3

    """
    return _isd_select(data, "RecordID", record_ids)

select_group_ids ¶

select_group_ids(
    data: DataFrame, group_ids: str | int | list | tuple
) -> pd.DataFrame

Filter the dataframe by GroupID.

PARAMETER	DESCRIPTION
`data`	ISD dataframe. TYPE: `DataFrame`
`group_ids`	GroupID(s) to filter by. TYPE: `str \| int \| list \| tuple`

RETURNS	DESCRIPTION
`DataFrame`	Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'GroupID': ['G1', 'G1', 'G2', 'G2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_group_ids(df, 'G1')
  GroupID  Value
0      G1      1
1      G1      2

Source code in src/soundscapy/databases/isd.py

def select_group_ids(
    data: pd.DataFrame, group_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by GroupID.

    Parameters
    ----------
    data
        ISD dataframe.
    group_ids
        GroupID(s) to filter by.

    Returns
    -------
    :
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'GroupID': ['G1', 'G1', 'G2', 'G2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_group_ids(df, 'G1')
      GroupID  Value
    0      G1      1
    1      G1      2

    """
    return _isd_select(data, "GroupID", group_ids)

select_session_ids ¶

select_session_ids(
    data: DataFrame, session_ids: str | int | list | tuple
) -> pd.DataFrame

Filter the dataframe by SessionID.

PARAMETER	DESCRIPTION
`data`	ISD dataframe. TYPE: `DataFrame`
`session_ids`	SessionID(s) to filter by. TYPE: `str \| int \| list \| tuple`

RETURNS	DESCRIPTION
`DataFrame`	Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'SessionID': ['S1', 'S1', 'S2', 'S2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_session_ids(df, ['S1', 'S2'])
  SessionID  Value
0        S1      1
1        S1      2
2        S2      3
3        S2      4

Source code in src/soundscapy/databases/isd.py

def select_session_ids(
    data: pd.DataFrame, session_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by SessionID.

    Parameters
    ----------
    data
        ISD dataframe.
    session_ids
        SessionID(s) to filter by.

    Returns
    -------
    :
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'SessionID': ['S1', 'S1', 'S2', 'S2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_session_ids(df, ['S1', 'S2'])
      SessionID  Value
    0        S1      1
    1        S1      2
    2        S2      3
    3        S2      4

    """
    return _isd_select(data, "SessionID", session_ids)

select_location_ids ¶

select_location_ids(
    data: DataFrame, location_ids: str | int | list | tuple
) -> pd.DataFrame

Filter the dataframe by LocationID.

PARAMETER	DESCRIPTION
`data`	ISD dataframe. TYPE: `DataFrame`
`location_ids`	LocationID(s) to filter by. TYPE: `str \| int \| list \| tuple`

RETURNS	DESCRIPTION
`DataFrame`	Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_location_ids(df, 'L2')
  LocationID  Value
2         L2      3
3         L2      4

Source code in src/soundscapy/databases/isd.py

def select_location_ids(
    data: pd.DataFrame, location_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by LocationID.

    Parameters
    ----------
    data
        ISD dataframe.
    location_ids
        LocationID(s) to filter by.

    Returns
    -------
    :
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_location_ids(df, 'L2')
      LocationID  Value
    2         L2      3
    3         L2      4

    """
    return _isd_select(data, "LocationID", location_ids)

describe_location ¶

describe_location(
    data: DataFrame,
    location: str,
    calc_type: str = "percent",
    pl_threshold: float = 0,
    ev_threshold: float = 0,
) -> dict[str, int | float]

Return a summary of the data for a specific location.

PARAMETER	DESCRIPTION
`data`	ISD dataframe. TYPE: `DataFrame`
`location`	Location to describe. TYPE: `str`
`calc_type`	Type of summary, either "percent" or "count", by default "percent". TYPE: `str` DEFAULT: `'percent'`
`pl_threshold`	Pleasantness threshold, by default 0. TYPE: `float` DEFAULT: `0`
`ev_threshold`	Eventfulness threshold, by default 0. TYPE: `float` DEFAULT: `0`

RETURNS	DESCRIPTION
`dict[str, int \| float]`	Summary of the data for the specified location.

Examples:

>>> from soundscapy.surveys.processing import add_iso_coords
>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'PAQ1': [4, 2, 3, 5],
...     'PAQ2': [3, 5, 2, 4],
...     'PAQ3': [2, 4, 1, 3],
...     'PAQ4': [1, 3, 4, 2],
...     'PAQ5': [5, 1, 5, 1],
...     'PAQ6': [4, 2, 3, 5],
...     'PAQ7': [3, 5, 2, 4],
...     'PAQ8': [2, 4, 1, 3],
... })
>>> df = add_iso_coords(df)
>>> result = describe_location(df, 'L1')
>>> set(result.keys()) == {
...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
...     'vibrant', 'chaotic', 'monotonous', 'calm'
... }
True
>>> result['count']
2

Source code in src/soundscapy/databases/isd.py

def describe_location(
    data: pd.DataFrame,
    location: str,
    calc_type: str = "percent",
    pl_threshold: float = 0,
    ev_threshold: float = 0,
) -> dict[str, int | float]:
    """
    Return a summary of the data for a specific location.

    Parameters
    ----------
    data
        ISD dataframe.
    location
        Location to describe.
    calc_type
        Type of summary, either "percent" or "count", by default "percent".
    pl_threshold
        Pleasantness threshold, by default 0.
    ev_threshold
        Eventfulness threshold, by default 0.

    Returns
    -------
    :
        Summary of the data for the specified location.

    Examples
    --------
    >>> from soundscapy.surveys.processing import add_iso_coords
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'PAQ1': [4, 2, 3, 5],
    ...     'PAQ2': [3, 5, 2, 4],
    ...     'PAQ3': [2, 4, 1, 3],
    ...     'PAQ4': [1, 3, 4, 2],
    ...     'PAQ5': [5, 1, 5, 1],
    ...     'PAQ6': [4, 2, 3, 5],
    ...     'PAQ7': [3, 5, 2, 4],
    ...     'PAQ8': [2, 4, 1, 3],
    ... })
    >>> df = add_iso_coords(df)
    >>> result = describe_location(df, 'L1')
    >>> set(result.keys()) == {
    ...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
    ...     'vibrant', 'chaotic', 'monotonous', 'calm'
    ... }
    True
    >>> result['count']
    2

    """
    loc_df = select_location_ids(data, location_ids=location)
    count = len(loc_df)

    if "ISOPleasant" not in loc_df.columns or "ISOEventful" not in loc_df.columns:
        iso_pleasant, iso_eventful = calculate_iso_coords(loc_df)
        loc_df = loc_df.assign(ISOPleasant=iso_pleasant, ISOEventful=iso_eventful)

    pl_count = (loc_df["ISOPleasant"] > pl_threshold).sum()
    ev_count = (loc_df["ISOEventful"] > ev_threshold).sum()
    vibrant_count = (
        (loc_df["ISOPleasant"] > pl_threshold) & (loc_df["ISOEventful"] > ev_threshold)
    ).sum()
    chaotic_count = (
        (loc_df["ISOPleasant"] < pl_threshold) & (loc_df["ISOEventful"] > ev_threshold)
    ).sum()
    mono_count = (
        (loc_df["ISOPleasant"] < pl_threshold) & (loc_df["ISOEventful"] < ev_threshold)
    ).sum()
    calm_count = (
        (loc_df["ISOPleasant"] > pl_threshold) & (loc_df["ISOEventful"] < ev_threshold)
    ).sum()

    res = {
        "count": count,
        "ISOPleasant": loc_df["ISOPleasant"].mean(),
        "ISOEventful": loc_df["ISOEventful"].mean(),
    }

    if calc_type == "percent":
        res.update(
            {
                "pleasant": pl_count / count,
                "eventful": ev_count / count,
                "vibrant": vibrant_count / count,
                "chaotic": chaotic_count / count,
                "monotonous": mono_count / count,
                "calm": calm_count / count,
            }
        )
    elif calc_type == "count":
        res.update(
            {
                "pleasant": pl_count,
                "eventful": ev_count,
                "vibrant": vibrant_count,
                "chaotic": chaotic_count,
                "monotonous": mono_count,
                "calm": calm_count,
            }
        )
    else:
        msg = "Type must be either 'percent' or 'count'"
        raise ValueError(msg)

    return {k: round(v, 3) if isinstance(v, float) else v for k, v in res.items()}

soundscapy_describe ¶

soundscapy_describe(
    df: DataFrame,
    group_by: str = "LocationID",
    calc_type: str = "percent",
) -> pd.DataFrame

Return a summary of the data grouped by a specified column.

PARAMETER	DESCRIPTION
`df`	ISD dataframe. TYPE: `DataFrame`
`group_by`	Column to group by, by default "LocationID". TYPE: `str` DEFAULT: `'LocationID'`
`calc_type`	Type of summary, either "percent" or "count", by default "percent". TYPE: `str` DEFAULT: `'percent'`

RETURNS	DESCRIPTION
`DataFrame`	Summary of the data.

Examples:

>>> from soundscapy.surveys.processing import add_iso_coords
>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'PAQ1': [4, 2, 3, 5],
...     'PAQ2': [3, 5, 2, 4],
...     'PAQ3': [2, 4, 1, 3],
...     'PAQ4': [1, 3, 4, 2],
...     'PAQ5': [5, 1, 5, 1],
...     'PAQ6': [4, 2, 3, 5],
...     'PAQ7': [3, 5, 2, 4],
...     'PAQ8': [2, 4, 1, 3],
... })
>>> df = add_iso_coords(df)
>>> result = soundscapy_describe(df)
>>> isinstance(result, pd.DataFrame)
True
>>> result.index.tolist()
['L1', 'L2']
>>> set(result.columns) == {
...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
...     'vibrant', 'chaotic', 'monotonous', 'calm'
... }
True
>>> result = soundscapy_describe(df, calc_type="count")
>>> result.loc['L1', 'count']
2

Source code in src/soundscapy/databases/isd.py

def soundscapy_describe(
    df: pd.DataFrame, group_by: str = "LocationID", calc_type: str = "percent"
) -> pd.DataFrame:
    """
    Return a summary of the data grouped by a specified column.

    Parameters
    ----------
    df
        ISD dataframe.
    group_by
        Column to group by, by default "LocationID".
    calc_type
        Type of summary, either "percent" or "count", by default "percent".

    Returns
    -------
    :
        Summary of the data.

    Examples
    --------
    >>> from soundscapy.surveys.processing import add_iso_coords
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'PAQ1': [4, 2, 3, 5],
    ...     'PAQ2': [3, 5, 2, 4],
    ...     'PAQ3': [2, 4, 1, 3],
    ...     'PAQ4': [1, 3, 4, 2],
    ...     'PAQ5': [5, 1, 5, 1],
    ...     'PAQ6': [4, 2, 3, 5],
    ...     'PAQ7': [3, 5, 2, 4],
    ...     'PAQ8': [2, 4, 1, 3],
    ... })
    >>> df = add_iso_coords(df)
    >>> result = soundscapy_describe(df)
    >>> isinstance(result, pd.DataFrame)
    True
    >>> result.index.tolist()
    ['L1', 'L2']
    >>> set(result.columns) == {
    ...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
    ...     'vibrant', 'chaotic', 'monotonous', 'calm'
    ... }
    True
    >>> result = soundscapy_describe(df, calc_type="count")
    >>> result.loc['L1', 'count']
    2

    """
    res = {
        location: describe_location(df, location, calc_type=calc_type)
        for location in df[group_by].unique()
    }
    return pd.DataFrame.from_dict(res, orient="index")

ARAUS¶

soundscapy.databases.araus ¶

Customized functions specifically for the ARAUS dataset.

SATP database helpers¶

soundscapy.databases.satp ¶

Module for handling the Soundscape Attributes Translation Project (SATP) database.

This module provides functions for loading and processing data from the Soundscape Attributes Translation Project database. It includes utilities for data retrieval from Zenodo and basic data loading operations.

Examples:

>>> import soundscapy.databases.satp as satp
>>> df = satp.load_zenodo()
>>> isinstance(df, pd.DataFrame)
True
>>> 'Language' in df.columns
True
>>> satp.load_participants()
Traceback (most recent call last):
    ...
ValueError: Participant data is only available for SATP versions up to v1.2.1.
>>> participants = satp.load_participants(version="v1.2")
>>> isinstance(participants, pd.DataFrame)
True
>>> 'Age' in participants.columns
True

CLASS	DESCRIPTION
`SATPVersion`	Versioned SATP dataset releases on Zenodo.

FUNCTION	DESCRIPTION
`load_zenodo`	Load the SATP dataset from Zenodo.
`load_participants`	Load the SATP participants dataset from Zenodo.

SATPVersion ¶

Bases: Enum

Versioned SATP dataset releases on Zenodo.

Each member stores the canonical version string and its Zenodo download URL. Version strings are normalised on lookup so "1.5", "v1.5", and "V1.5" all resolve to the same member. The string "latest" resolves to the first (newest) member.

Examples:

>>> SATPVersion("v1.2").url
'https://zenodo.org/record/7143599/files/SATP%20Dataset%20v1.2.xlsx'
>>> SATPVersion("1.2") is SATPVersion("V1.2")
True
>>> SATPVersion("latest") is SATPVersion.V1_5
True
>>> SATPVersion("invalid")
Traceback (most recent call last):
    ...
ValueError: 'invalid' is not a valid SATPVersion

METHOD	DESCRIPTION
`__new__`	Create a new member with a canonical version string and download URL.
`latest`	Return the most recent released version (first declared member).
`__lt__`	Return True if this version is older than other.
`__str__`	Return the canonical version string.

new ¶

__new__(version: str, url: str) -> Self

Create a new member with a canonical version string and download URL.

Source code in src/soundscapy/databases/satp.py

def __new__(cls, version: str, url: str) -> Self:
    """Create a new member with a canonical version string and download URL."""
    obj = object.__new__(cls)
    obj._value_ = version
    obj.url = url
    return obj

latest `classmethod` ¶

latest() -> SATPVersion

Return the most recent released version (first declared member).

Source code in src/soundscapy/databases/satp.py

@classmethod
def latest(cls) -> SATPVersion:
    """Return the most recent released version (first declared member)."""
    return next(iter(cls))

lt ¶

__lt__(other: object) -> bool

Return True if this version is older than other.

Source code in src/soundscapy/databases/satp.py

def __lt__(self, other: object) -> bool:
    """Return True if this version is older than other."""
    if not isinstance(other, SATPVersion):
        return NotImplemented
    return Version(str(self)) < Version(str(other))

str ¶

__str__() -> str

Return the canonical version string.

Source code in src/soundscapy/databases/satp.py

def __str__(self) -> str:
    """Return the canonical version string."""
    return self.value

load_zenodo ¶

load_zenodo(version: str = 'latest') -> pd.DataFrame

Load the SATP dataset from Zenodo.

PARAMETER	DESCRIPTION
`version`	Version of the dataset to load. The default is "latest". TYPE: `str` DEFAULT: `'latest'`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame containing the SATP dataset.

Source code in src/soundscapy/databases/satp.py

def load_zenodo(version: str = "latest") -> pd.DataFrame:
    """
    Load the SATP dataset from Zenodo.

    Parameters
    ----------
    version
        Version of the dataset to load. The default is "latest".

    Returns
    -------
    :
        DataFrame containing the SATP dataset.

    """
    resolved = SATPVersion(version)
    logger.debug(f"Fetching SATP dataset URL for version: {resolved}")
    data = pd.read_excel(resolved.url, engine="openpyxl", sheet_name="Main Merge")
    logger.info(f"Loaded SATP dataset version {resolved} from Zenodo")
    return data

load_participants ¶

load_participants(version: str = 'latest') -> pd.DataFrame

Load the SATP participants dataset from Zenodo.

PARAMETER	DESCRIPTION
`version`	Version of the dataset to load. The default is "latest". TYPE: `str` DEFAULT: `'latest'`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame containing the SATP participants dataset.

Source code in src/soundscapy/databases/satp.py

def load_participants(version: str = "latest") -> pd.DataFrame:
    """
    Load the SATP participants dataset from Zenodo.

    Parameters
    ----------
    version
        Version of the dataset to load. The default is "latest".

    Returns
    -------
    :
        DataFrame containing the SATP participants dataset.

    """
    resolved = SATPVersion(version)
    if SATPVersion(version) > SATPVersion.V1_2_1:
        msg = "Participant data is only available for SATP versions up to v1.2.1."
        raise ValueError(msg)
    logger.debug(f"Fetching SATP dataset URL for version: {resolved}")
    data = pd.read_excel(resolved.url, engine="openpyxl", sheet_name="Participants")
    data = data.drop(columns=["Unnamed: 3", "Unnamed: 4"])
    logger.info(f"Loaded SATP participants dataset version {resolved} from Zenodo")
    return data

Databases¶

soundscapy.databases ¶

ISD¶

soundscapy.databases.isd ¶

load ¶

load_zenodo ¶

validate ¶

match_col_to_likert_scale ¶

likert_categorical_from_data ¶

select_record_ids ¶

select_group_ids ¶

select_session_ids ¶

select_location_ids ¶

describe_location ¶

soundscapy_describe ¶

ARAUS¶

soundscapy.databases.araus ¶

SATP database helpers¶

soundscapy.databases.satp ¶

SATPVersion ¶

__new__ ¶

latest classmethod ¶

__lt__ ¶

__str__ ¶

load_zenodo ¶

load_participants ¶

new ¶

latest `classmethod` ¶

lt ¶

str ¶