Skip to content

Databases

soundscapy.databases

Soundscapy Databases Module.

This module handles connections to and operations on soundscape databases, primarily focused on the International Soundscape Database (ISD).

MODULE DESCRIPTION
araus

Customized functions specifically for the ARAUS dataset.

isd

Module for handling the International Soundscape Database (ISD).

satp

Module for handling the Soundscape Attributes Translation Project (SATP) database.

ISD

soundscapy.databases.isd

Module for handling the International Soundscape Database (ISD).

This module provides functions for loading, validating, and analyzing data from the International Soundscape Database. It includes utilities for data retrieval, quality checks, and basic analysis operations.

Notes

The ISD is a large-scale database of soundscape surveys and recordings collected across multiple cities. This module is designed to work with the specific structure and content of the ISD.

Examples:

>>> import soundscapy.databases.isd as isd
>>> df = isd.load()
>>> isinstance(df, pd.DataFrame)
True
>>> 'PAQ1' in df.columns
True
FUNCTION DESCRIPTION
load

Load the example "ISD" csv file to a DataFrame.

load_zenodo

Automatically fetch and load the ISD dataset from Zenodo.

validate

Perform data quality checks and validate that the dataset fits the expected format.

match_col_to_likert_scale

Match a column in the DataFrame to the Likert scale.

likert_categorical_from_data

Get the Likert labels for a specific column in the DataFrame.

select_record_ids

Filter the dataframe by RecordID.

select_group_ids

Filter the dataframe by GroupID.

select_session_ids

Filter the dataframe by SessionID.

select_location_ids

Filter the dataframe by LocationID.

describe_location

Return a summary of the data for a specific location.

soundscapy_describe

Return a summary of the data grouped by a specified column.

load

load(locations: list[str] | None = None) -> pd.DataFrame

Load the example "ISD" csv file to a DataFrame.

PARAMETER DESCRIPTION
locations

Optional list of LocationIDs to filter the data by. If None, loads all data.

TYPE: list[str] | None DEFAULT: None

RETURNS DESCRIPTION
DataFrame

DataFrame containing ISD data.

Notes

This function loads the ISD data from a local CSV file included with the soundscapy package.

References

Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M., Lionello, M., & Kang, J. (2022). The International Soundscape Database: An integrated multimedia database of urban soundscape surveys -- questionnaires with acoustical and contextual information (0.2.4) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6331810

Examples:

>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load()
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True
Source code in src/soundscapy/databases/isd.py
def load(locations: list[str] | None = None) -> pd.DataFrame:
    """
    Load the example "ISD" csv file to a DataFrame.

    Parameters
    ----------
    locations
        Optional list of LocationIDs to filter the data by. If None, loads all data.

    Returns
    -------
    :
        DataFrame containing ISD data.

    Notes
    -----
    This function loads the ISD data from a local CSV file included
    with the soundscapy package.

    References
    ----------
    Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M.,
    Lionello, M., & Kang, J. (2022). The International Soundscape Database:
    An integrated multimedia database of urban soundscape surveys --
    questionnaires with acoustical and contextual information (0.2.4) [Data set].
    Zenodo. https://doi.org/10.5281/zenodo.6331810

    Examples
    --------
    >>> from soundscapy.surveys.survey_utils import PAQ_IDS
    >>> df = load()
    >>> isinstance(df, pd.DataFrame)
    True
    >>> set(PAQ_IDS).issubset(df.columns)
    True

    """
    isd_resource = resources.files("soundscapy.data").joinpath("ISD v1.0 Data.csv")
    with resources.as_file(isd_resource) as f:
        data = pd.read_csv(f)
    data = rename_paqs(data, _PAQ_ALIASES)
    logger.info("Loaded ISD data from Soundscapy's included CSV file.")

    if locations is not None:
        data = select_location_ids(data, locations)

    return data

load_zenodo

load_zenodo(version: str = 'latest') -> pd.DataFrame

Automatically fetch and load the ISD dataset from Zenodo.

PARAMETER DESCRIPTION
version

Version number of the dataset to fetch, by default "latest".

TYPE: str DEFAULT: 'latest'

RETURNS DESCRIPTION
DataFrame

DataFrame containing ISD data.

RAISES DESCRIPTION
ValueError

If the specified version is not recognized.

Notes

This function fetches the ISD data directly from Zenodo, allowing access to different versions of the dataset.

Examples:

>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load_zenodo("v1.0.1")
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True
Source code in src/soundscapy/databases/isd.py
def load_zenodo(version: str = "latest") -> pd.DataFrame:
    """
    Automatically fetch and load the ISD dataset from Zenodo.

    Parameters
    ----------
    version
        Version number of the dataset to fetch, by default "latest".

    Returns
    -------
    :
        DataFrame containing ISD data.

    Raises
    ------
    ValueError
        If the specified version is not recognized.

    Notes
    -----
    This function fetches the ISD data directly from Zenodo, allowing
    access to different versions of the dataset.

    Examples
    --------
    >>> from soundscapy.surveys.survey_utils import PAQ_IDS  # doctest: +SKIP
    >>> df = load_zenodo("v1.0.1")  # doctest: +SKIP
    >>> isinstance(df, pd.DataFrame)  # doctest: +SKIP
    True
    >>> set(PAQ_IDS).issubset(df.columns)  # doctest: +SKIP
    True

    """
    version = version.lower()
    version = "v1.0.1" if version == "latest" else version

    url_mapping = {
        "v0.2.0": "https://zenodo.org/record/5578573/files/SSID%20Lockdown%20Database%20VL0.2.1.xlsx",
        "v0.2.1": "https://zenodo.org/record/5578573/files/SSID%20Lockdown%20Database%20VL0.2.1.xlsx",
        "v0.2.2": "https://zenodo.org/record/5705908/files/SSID%20Lockdown%20Database%20VL0.2.2.xlsx",
        "v0.2.3": "https://zenodo.org/record/5914762/files/SSID%20Lockdown%20Database%20VL0.2.2.xlsx",
        "v1.0.0": "https://zenodo.org/records/10639661/files/ISD%20v1.0%20Data.csv",
        "v1.0.1": "https://zenodo.org/records/10639661/files/ISD%20v1.0%20Data.csv",
    }

    if version not in url_mapping:
        msg = f"Version {version} not recognised."
        raise ValueError(msg)

    url = url_mapping[version]
    file_type = "csv" if version in ["v1.0.0", "v1.0.1"] else "excel"

    data = (
        pd.read_csv(url)
        if file_type == "csv"
        else pd.read_excel(url, engine="openpyxl")
    )
    data = rename_paqs(data, _PAQ_ALIASES)

    logger.info(f"Loaded ISD data version {version} from Zenodo")
    return data

validate

validate(
    df: DataFrame,
    paq_aliases: list | dict = _PAQ_ALIASES,
    val_range: tuple[int, int] = (1, 5),
    *,
    allow_paq_na: bool = False,
) -> tuple[pd.DataFrame, pd.DataFrame | None]

Perform data quality checks and validate that the dataset fits the expected format.

PARAMETER DESCRIPTION
df

ISD style dataframe, including PAQ data.

TYPE: DataFrame

paq_aliases

List of PAQ names (in order) or dict of PAQ names with new names as values.

TYPE: list | dict DEFAULT: _PAQ_ALIASES

allow_paq_na

If True, allow NaN values in PAQ data, by default False.

TYPE: bool DEFAULT: False

val_range

Min and max range of the PAQ response values, by default (1, 5).

TYPE: tuple[int, int] DEFAULT: (1, 5)

RETURNS DESCRIPTION
tuple[DataFrame, DataFrame | None]

Tuple containing the cleaned dataframe and optionally a dataframe of excluded samples.

Notes

This function renames PAQ columns, checks PAQ data quality, and optionally removes rows with invalid or missing PAQ values.

Examples:

>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({
...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
... })
>>> clean_df, excl_df = validate(df, allow_paq_na=True)
>>> clean_df.shape[0]
2
>>> excl_df.shape[0]
2
Source code in src/soundscapy/databases/isd.py
def validate(
    df: pd.DataFrame,
    paq_aliases: list | dict = _PAQ_ALIASES,
    val_range: tuple[int, int] = (1, 5),
    *,
    allow_paq_na: bool = False,
) -> tuple[pd.DataFrame, pd.DataFrame | None]:
    """
    Perform data quality checks and validate that the dataset fits the expected format.

    Parameters
    ----------
    df
        ISD style dataframe, including PAQ data.
    paq_aliases
        List of PAQ names (in order) or dict of PAQ names with new names as values.
    allow_paq_na
        If True, allow NaN values in PAQ data, by default False.
    val_range
        Min and max range of the PAQ response values, by default (1, 5).

    Returns
    -------
    :
        Tuple containing the cleaned dataframe
        and optionally a dataframe of excluded samples.

    Notes
    -----
    This function renames PAQ columns, checks PAQ data quality, and optionally
    removes rows with invalid or missing PAQ values.

    Examples
    --------
    >>> import pandas as pd
    >>> import numpy as np
    >>> df = pd.DataFrame({
    ...     'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
    ...     'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
    ...     'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
    ... })
    >>> clean_df, excl_df = validate(df, allow_paq_na=True)
    >>> clean_df.shape[0]
    2
    >>> excl_df.shape[0]
    2

    """
    logger.info("Validating ISD data")
    data = rename_paqs(df, paq_aliases)

    invalid_indices = likert_data_quality(
        data, val_range=val_range, allow_na=allow_paq_na
    )

    if invalid_indices:
        excl_data = data.iloc[invalid_indices]
        data = data.drop(data.index[invalid_indices])
        logger.info(f"Removed {len(invalid_indices)} rows with invalid PAQ data")
    else:
        excl_data = None
        logger.info("All PAQ data passed quality checks")

    return data, excl_data

match_col_to_likert_scale

match_col_to_likert_scale(col: str | None) -> Scale

Match a column in the DataFrame to the Likert scale.

PARAMETER DESCRIPTION
col

Column name to match.

TYPE: str | None

RETURNS DESCRIPTION
Scale

Likert scale object.

Source code in src/soundscapy/databases/isd.py
def match_col_to_likert_scale(col: str | None) -> Scale:  # noqa: PLR0911
    """
    Match a column in the DataFrame to the Likert scale.

    Parameters
    ----------
    col
        Column name to match.

    Returns
    -------
    :
        Likert scale object.

    """
    if col in PAQ_IDS or col in PAQ_LABELS:
        return LIKERT_SCALES.paq
    if col in ["traffic_noise", "other_noise", "human_sounds", "natural_sounds"]:
        return LIKERT_SCALES.source
    if col in ["overall_sound_environment"]:
        return LIKERT_SCALES.overall
    if col in ["appropriate"]:
        return LIKERT_SCALES.appropriate
    if col in ["perceived_loud"]:
        return LIKERT_SCALES.loud
    if col in ["visit_often"]:
        return LIKERT_SCALES.often
    if col in ["like_to_visit"]:
        return LIKERT_SCALES.visit

    msg = f"Column {col} does not match any known Likert scale."
    raise ValueError(msg)

likert_categorical_from_data

likert_categorical_from_data(
    data: Series,
) -> pd.Categorical

Get the Likert labels for a specific column in the DataFrame.

PARAMETER DESCRIPTION
data

Series containing the data.

TYPE: Series

RETURNS DESCRIPTION
Categorical

Series with Likert labels.

RAISES DESCRIPTION
ValueError

If the column does not match any known Likert scale.

Source code in src/soundscapy/databases/isd.py
def likert_categorical_from_data(
    data: pd.Series,
) -> pd.Categorical:
    """
    Get the Likert labels for a specific column in the DataFrame.

    Parameters
    ----------
    data
        Series containing the data.

    Returns
    -------
    :
        Series with Likert labels.

    Raises
    ------
    ValueError
        If the column does not match any known Likert scale.

    """
    likert_scale = match_col_to_likert_scale(str(data.name))
    if isinstance(data, pd.Categorical):
        return data

    data = data.astype("int") - 1  # Convert to zero-based index
    codes = data.to_list()

    return pd.Categorical.from_codes(
        codes,
        dtype=CategoricalDtype(categories=likert_scale, ordered=True),
    )

select_record_ids

select_record_ids(
    data: DataFrame, record_ids: str | int | list | tuple
) -> pd.DataFrame

Filter the dataframe by RecordID.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

record_ids

RecordID(s) to filter by.

TYPE: str | int | list | tuple

RETURNS DESCRIPTION
DataFrame

Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'RecordID': ['A', 'B', 'C', 'D'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_record_ids(df, ['A', 'C'])
  RecordID  Value
0        A      1
2        C      3
Source code in src/soundscapy/databases/isd.py
def select_record_ids(
    data: pd.DataFrame, record_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by RecordID.

    Parameters
    ----------
    data
        ISD dataframe.
    record_ids
        RecordID(s) to filter by.

    Returns
    -------
    :
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'RecordID': ['A', 'B', 'C', 'D'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_record_ids(df, ['A', 'C'])
      RecordID  Value
    0        A      1
    2        C      3

    """
    return _isd_select(data, "RecordID", record_ids)

select_group_ids

select_group_ids(
    data: DataFrame, group_ids: str | int | list | tuple
) -> pd.DataFrame

Filter the dataframe by GroupID.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

group_ids

GroupID(s) to filter by.

TYPE: str | int | list | tuple

RETURNS DESCRIPTION
DataFrame

Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'GroupID': ['G1', 'G1', 'G2', 'G2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_group_ids(df, 'G1')
  GroupID  Value
0      G1      1
1      G1      2
Source code in src/soundscapy/databases/isd.py
def select_group_ids(
    data: pd.DataFrame, group_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by GroupID.

    Parameters
    ----------
    data
        ISD dataframe.
    group_ids
        GroupID(s) to filter by.

    Returns
    -------
    :
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'GroupID': ['G1', 'G1', 'G2', 'G2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_group_ids(df, 'G1')
      GroupID  Value
    0      G1      1
    1      G1      2

    """
    return _isd_select(data, "GroupID", group_ids)

select_session_ids

select_session_ids(
    data: DataFrame, session_ids: str | int | list | tuple
) -> pd.DataFrame

Filter the dataframe by SessionID.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

session_ids

SessionID(s) to filter by.

TYPE: str | int | list | tuple

RETURNS DESCRIPTION
DataFrame

Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'SessionID': ['S1', 'S1', 'S2', 'S2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_session_ids(df, ['S1', 'S2'])
  SessionID  Value
0        S1      1
1        S1      2
2        S2      3
3        S2      4
Source code in src/soundscapy/databases/isd.py
def select_session_ids(
    data: pd.DataFrame, session_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by SessionID.

    Parameters
    ----------
    data
        ISD dataframe.
    session_ids
        SessionID(s) to filter by.

    Returns
    -------
    :
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'SessionID': ['S1', 'S1', 'S2', 'S2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_session_ids(df, ['S1', 'S2'])
      SessionID  Value
    0        S1      1
    1        S1      2
    2        S2      3
    3        S2      4

    """
    return _isd_select(data, "SessionID", session_ids)

select_location_ids

select_location_ids(
    data: DataFrame, location_ids: str | int | list | tuple
) -> pd.DataFrame

Filter the dataframe by LocationID.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

location_ids

LocationID(s) to filter by.

TYPE: str | int | list | tuple

RETURNS DESCRIPTION
DataFrame

Filtered dataframe.

Examples:

>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'Value': [1, 2, 3, 4]
... })
>>> select_location_ids(df, 'L2')
  LocationID  Value
2         L2      3
3         L2      4
Source code in src/soundscapy/databases/isd.py
def select_location_ids(
    data: pd.DataFrame, location_ids: str | int | list | tuple
) -> pd.DataFrame:
    """
    Filter the dataframe by LocationID.

    Parameters
    ----------
    data
        ISD dataframe.
    location_ids
        LocationID(s) to filter by.

    Returns
    -------
    :
        Filtered dataframe.

    Examples
    --------
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'Value': [1, 2, 3, 4]
    ... })
    >>> select_location_ids(df, 'L2')
      LocationID  Value
    2         L2      3
    3         L2      4

    """
    return _isd_select(data, "LocationID", location_ids)

describe_location

describe_location(
    data: DataFrame,
    location: str,
    calc_type: str = "percent",
    pl_threshold: float = 0,
    ev_threshold: float = 0,
) -> dict[str, int | float]

Return a summary of the data for a specific location.

PARAMETER DESCRIPTION
data

ISD dataframe.

TYPE: DataFrame

location

Location to describe.

TYPE: str

calc_type

Type of summary, either "percent" or "count", by default "percent".

TYPE: str DEFAULT: 'percent'

pl_threshold

Pleasantness threshold, by default 0.

TYPE: float DEFAULT: 0

ev_threshold

Eventfulness threshold, by default 0.

TYPE: float DEFAULT: 0

RETURNS DESCRIPTION
dict[str, int | float]

Summary of the data for the specified location.

Examples:

>>> from soundscapy.surveys.processing import add_iso_coords
>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'PAQ1': [4, 2, 3, 5],
...     'PAQ2': [3, 5, 2, 4],
...     'PAQ3': [2, 4, 1, 3],
...     'PAQ4': [1, 3, 4, 2],
...     'PAQ5': [5, 1, 5, 1],
...     'PAQ6': [4, 2, 3, 5],
...     'PAQ7': [3, 5, 2, 4],
...     'PAQ8': [2, 4, 1, 3],
... })
>>> df = add_iso_coords(df)
>>> result = describe_location(df, 'L1')
>>> set(result.keys()) == {
...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
...     'vibrant', 'chaotic', 'monotonous', 'calm'
... }
True
>>> result['count']
2
Source code in src/soundscapy/databases/isd.py
def describe_location(
    data: pd.DataFrame,
    location: str,
    calc_type: str = "percent",
    pl_threshold: float = 0,
    ev_threshold: float = 0,
) -> dict[str, int | float]:
    """
    Return a summary of the data for a specific location.

    Parameters
    ----------
    data
        ISD dataframe.
    location
        Location to describe.
    calc_type
        Type of summary, either "percent" or "count", by default "percent".
    pl_threshold
        Pleasantness threshold, by default 0.
    ev_threshold
        Eventfulness threshold, by default 0.

    Returns
    -------
    :
        Summary of the data for the specified location.

    Examples
    --------
    >>> from soundscapy.surveys.processing import add_iso_coords
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'PAQ1': [4, 2, 3, 5],
    ...     'PAQ2': [3, 5, 2, 4],
    ...     'PAQ3': [2, 4, 1, 3],
    ...     'PAQ4': [1, 3, 4, 2],
    ...     'PAQ5': [5, 1, 5, 1],
    ...     'PAQ6': [4, 2, 3, 5],
    ...     'PAQ7': [3, 5, 2, 4],
    ...     'PAQ8': [2, 4, 1, 3],
    ... })
    >>> df = add_iso_coords(df)
    >>> result = describe_location(df, 'L1')
    >>> set(result.keys()) == {
    ...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
    ...     'vibrant', 'chaotic', 'monotonous', 'calm'
    ... }
    True
    >>> result['count']
    2

    """
    loc_df = select_location_ids(data, location_ids=location)
    count = len(loc_df)

    if "ISOPleasant" not in loc_df.columns or "ISOEventful" not in loc_df.columns:
        iso_pleasant, iso_eventful = calculate_iso_coords(loc_df)
        loc_df = loc_df.assign(ISOPleasant=iso_pleasant, ISOEventful=iso_eventful)

    pl_count = (loc_df["ISOPleasant"] > pl_threshold).sum()
    ev_count = (loc_df["ISOEventful"] > ev_threshold).sum()
    vibrant_count = (
        (loc_df["ISOPleasant"] > pl_threshold) & (loc_df["ISOEventful"] > ev_threshold)
    ).sum()
    chaotic_count = (
        (loc_df["ISOPleasant"] < pl_threshold) & (loc_df["ISOEventful"] > ev_threshold)
    ).sum()
    mono_count = (
        (loc_df["ISOPleasant"] < pl_threshold) & (loc_df["ISOEventful"] < ev_threshold)
    ).sum()
    calm_count = (
        (loc_df["ISOPleasant"] > pl_threshold) & (loc_df["ISOEventful"] < ev_threshold)
    ).sum()

    res = {
        "count": count,
        "ISOPleasant": loc_df["ISOPleasant"].mean(),
        "ISOEventful": loc_df["ISOEventful"].mean(),
    }

    if calc_type == "percent":
        res.update(
            {
                "pleasant": pl_count / count,
                "eventful": ev_count / count,
                "vibrant": vibrant_count / count,
                "chaotic": chaotic_count / count,
                "monotonous": mono_count / count,
                "calm": calm_count / count,
            }
        )
    elif calc_type == "count":
        res.update(
            {
                "pleasant": pl_count,
                "eventful": ev_count,
                "vibrant": vibrant_count,
                "chaotic": chaotic_count,
                "monotonous": mono_count,
                "calm": calm_count,
            }
        )
    else:
        msg = "Type must be either 'percent' or 'count'"
        raise ValueError(msg)

    return {k: round(v, 3) if isinstance(v, float) else v for k, v in res.items()}

soundscapy_describe

soundscapy_describe(
    df: DataFrame,
    group_by: str = "LocationID",
    calc_type: str = "percent",
) -> pd.DataFrame

Return a summary of the data grouped by a specified column.

PARAMETER DESCRIPTION
df

ISD dataframe.

TYPE: DataFrame

group_by

Column to group by, by default "LocationID".

TYPE: str DEFAULT: 'LocationID'

calc_type

Type of summary, either "percent" or "count", by default "percent".

TYPE: str DEFAULT: 'percent'

RETURNS DESCRIPTION
DataFrame

Summary of the data.

Examples:

>>> from soundscapy.surveys.processing import add_iso_coords
>>> df = pd.DataFrame({
...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
...     'PAQ1': [4, 2, 3, 5],
...     'PAQ2': [3, 5, 2, 4],
...     'PAQ3': [2, 4, 1, 3],
...     'PAQ4': [1, 3, 4, 2],
...     'PAQ5': [5, 1, 5, 1],
...     'PAQ6': [4, 2, 3, 5],
...     'PAQ7': [3, 5, 2, 4],
...     'PAQ8': [2, 4, 1, 3],
... })
>>> df = add_iso_coords(df)
>>> result = soundscapy_describe(df)
>>> isinstance(result, pd.DataFrame)
True
>>> result.index.tolist()
['L1', 'L2']
>>> set(result.columns) == {
...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
...     'vibrant', 'chaotic', 'monotonous', 'calm'
... }
True
>>> result = soundscapy_describe(df, calc_type="count")
>>> result.loc['L1', 'count']
2
Source code in src/soundscapy/databases/isd.py
def soundscapy_describe(
    df: pd.DataFrame, group_by: str = "LocationID", calc_type: str = "percent"
) -> pd.DataFrame:
    """
    Return a summary of the data grouped by a specified column.

    Parameters
    ----------
    df
        ISD dataframe.
    group_by
        Column to group by, by default "LocationID".
    calc_type
        Type of summary, either "percent" or "count", by default "percent".

    Returns
    -------
    :
        Summary of the data.

    Examples
    --------
    >>> from soundscapy.surveys.processing import add_iso_coords
    >>> df = pd.DataFrame({
    ...     'LocationID': ['L1', 'L1', 'L2', 'L2'],
    ...     'PAQ1': [4, 2, 3, 5],
    ...     'PAQ2': [3, 5, 2, 4],
    ...     'PAQ3': [2, 4, 1, 3],
    ...     'PAQ4': [1, 3, 4, 2],
    ...     'PAQ5': [5, 1, 5, 1],
    ...     'PAQ6': [4, 2, 3, 5],
    ...     'PAQ7': [3, 5, 2, 4],
    ...     'PAQ8': [2, 4, 1, 3],
    ... })
    >>> df = add_iso_coords(df)
    >>> result = soundscapy_describe(df)
    >>> isinstance(result, pd.DataFrame)
    True
    >>> result.index.tolist()
    ['L1', 'L2']
    >>> set(result.columns) == {
    ...     'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
    ...     'vibrant', 'chaotic', 'monotonous', 'calm'
    ... }
    True
    >>> result = soundscapy_describe(df, calc_type="count")
    >>> result.loc['L1', 'count']
    2

    """
    res = {
        location: describe_location(df, location, calc_type=calc_type)
        for location in df[group_by].unique()
    }
    return pd.DataFrame.from_dict(res, orient="index")

ARAUS

soundscapy.databases.araus

Customized functions specifically for the ARAUS dataset.

SATP database helpers

soundscapy.databases.satp

Module for handling the Soundscape Attributes Translation Project (SATP) database.

This module provides functions for loading and processing data from the Soundscape Attributes Translation Project database. It includes utilities for data retrieval from Zenodo and basic data loading operations.

Examples:

>>> import soundscapy.databases.satp as satp
>>> df = satp.load_zenodo()
>>> isinstance(df, pd.DataFrame)
True
>>> 'Language' in df.columns
True
>>> satp.load_participants()
Traceback (most recent call last):
    ...
ValueError: Participant data is only available for SATP versions up to v1.2.1.
>>> participants = satp.load_participants(version="v1.2")
>>> isinstance(participants, pd.DataFrame)
True
>>> 'Age' in participants.columns
True
CLASS DESCRIPTION
SATPVersion

Versioned SATP dataset releases on Zenodo.

FUNCTION DESCRIPTION
load_zenodo

Load the SATP dataset from Zenodo.

load_participants

Load the SATP participants dataset from Zenodo.

SATPVersion

Bases: Enum

Versioned SATP dataset releases on Zenodo.

Each member stores the canonical version string and its Zenodo download URL. Version strings are normalised on lookup so "1.5", "v1.5", and "V1.5" all resolve to the same member. The string "latest" resolves to the first (newest) member.

Examples:

>>> SATPVersion("v1.2").url
'https://zenodo.org/record/7143599/files/SATP%20Dataset%20v1.2.xlsx'
>>> SATPVersion("1.2") is SATPVersion("V1.2")
True
>>> SATPVersion("latest") is SATPVersion.V1_5
True
>>> SATPVersion("invalid")
Traceback (most recent call last):
    ...
ValueError: 'invalid' is not a valid SATPVersion
METHOD DESCRIPTION
__new__

Create a new member with a canonical version string and download URL.

latest

Return the most recent released version (first declared member).

__lt__

Return True if this version is older than other.

__str__

Return the canonical version string.

__new__
__new__(version: str, url: str) -> Self

Create a new member with a canonical version string and download URL.

Source code in src/soundscapy/databases/satp.py
def __new__(cls, version: str, url: str) -> Self:
    """Create a new member with a canonical version string and download URL."""
    obj = object.__new__(cls)
    obj._value_ = version
    obj.url = url
    return obj
latest classmethod
latest() -> SATPVersion

Return the most recent released version (first declared member).

Source code in src/soundscapy/databases/satp.py
@classmethod
def latest(cls) -> SATPVersion:
    """Return the most recent released version (first declared member)."""
    return next(iter(cls))
__lt__
__lt__(other: object) -> bool

Return True if this version is older than other.

Source code in src/soundscapy/databases/satp.py
def __lt__(self, other: object) -> bool:
    """Return True if this version is older than other."""
    if not isinstance(other, SATPVersion):
        return NotImplemented
    return Version(str(self)) < Version(str(other))
__str__
__str__() -> str

Return the canonical version string.

Source code in src/soundscapy/databases/satp.py
def __str__(self) -> str:
    """Return the canonical version string."""
    return self.value

load_zenodo

load_zenodo(version: str = 'latest') -> pd.DataFrame

Load the SATP dataset from Zenodo.

PARAMETER DESCRIPTION
version

Version of the dataset to load. The default is "latest".

TYPE: str DEFAULT: 'latest'

RETURNS DESCRIPTION
DataFrame

DataFrame containing the SATP dataset.

Source code in src/soundscapy/databases/satp.py
def load_zenodo(version: str = "latest") -> pd.DataFrame:
    """
    Load the SATP dataset from Zenodo.

    Parameters
    ----------
    version
        Version of the dataset to load. The default is "latest".

    Returns
    -------
    :
        DataFrame containing the SATP dataset.

    """
    resolved = SATPVersion(version)
    logger.debug(f"Fetching SATP dataset URL for version: {resolved}")
    data = pd.read_excel(resolved.url, engine="openpyxl", sheet_name="Main Merge")
    logger.info(f"Loaded SATP dataset version {resolved} from Zenodo")
    return data

load_participants

load_participants(version: str = 'latest') -> pd.DataFrame

Load the SATP participants dataset from Zenodo.

PARAMETER DESCRIPTION
version

Version of the dataset to load. The default is "latest".

TYPE: str DEFAULT: 'latest'

RETURNS DESCRIPTION
DataFrame

DataFrame containing the SATP participants dataset.

Source code in src/soundscapy/databases/satp.py
def load_participants(version: str = "latest") -> pd.DataFrame:
    """
    Load the SATP participants dataset from Zenodo.

    Parameters
    ----------
    version
        Version of the dataset to load. The default is "latest".

    Returns
    -------
    :
        DataFrame containing the SATP participants dataset.

    """
    resolved = SATPVersion(version)
    if SATPVersion(version) > SATPVersion.V1_2_1:
        msg = "Participant data is only available for SATP versions up to v1.2.1."
        raise ValueError(msg)
    logger.debug(f"Fetching SATP dataset URL for version: {resolved}")
    data = pd.read_excel(resolved.url, engine="openpyxl", sheet_name="Participants")
    data = data.drop(columns=["Unnamed: 3", "Unnamed: 4"])
    logger.info(f"Loaded SATP participants dataset version {resolved} from Zenodo")
    return data