Databases¶
soundscapy.databases
¶
Soundscapy Databases Module.
This module handles connections to and operations on soundscape databases, primarily focused on the International Soundscape Database (ISD).
| MODULE | DESCRIPTION |
|---|---|
araus |
Customized functions specifically for the ARAUS dataset. |
isd |
Module for handling the International Soundscape Database (ISD). |
satp |
Module for handling the Soundscape Attributes Translation Project (SATP) database. |
ISD¶
soundscapy.databases.isd
¶
Module for handling the International Soundscape Database (ISD).
This module provides functions for loading, validating, and analyzing data from the International Soundscape Database. It includes utilities for data retrieval, quality checks, and basic analysis operations.
Notes
The ISD is a large-scale database of soundscape surveys and recordings collected across multiple cities. This module is designed to work with the specific structure and content of the ISD.
Examples:
>>> import soundscapy.databases.isd as isd
>>> df = isd.load()
>>> isinstance(df, pd.DataFrame)
True
>>> 'PAQ1' in df.columns
True
| FUNCTION | DESCRIPTION |
|---|---|
load |
Load the example "ISD" csv file to a DataFrame. |
load_zenodo |
Automatically fetch and load the ISD dataset from Zenodo. |
validate |
Perform data quality checks and validate that the dataset fits the expected format. |
match_col_to_likert_scale |
Match a column in the DataFrame to the Likert scale. |
likert_categorical_from_data |
Get the Likert labels for a specific column in the DataFrame. |
select_record_ids |
Filter the dataframe by RecordID. |
select_group_ids |
Filter the dataframe by GroupID. |
select_session_ids |
Filter the dataframe by SessionID. |
select_location_ids |
Filter the dataframe by LocationID. |
describe_location |
Return a summary of the data for a specific location. |
soundscapy_describe |
Return a summary of the data grouped by a specified column. |
load
¶
Load the example "ISD" csv file to a DataFrame.
| PARAMETER | DESCRIPTION |
|---|---|
locations
|
Optional list of LocationIDs to filter the data by. If None, loads all data. |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame containing ISD data. |
Notes
This function loads the ISD data from a local CSV file included with the soundscapy package.
References
Mitchell, A., Oberman, T., Aletta, F., Erfanian, M., Kachlicka, M., Lionello, M., & Kang, J. (2022). The International Soundscape Database: An integrated multimedia database of urban soundscape surveys -- questionnaires with acoustical and contextual information (0.2.4) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.6331810
Examples:
>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load()
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True
Source code in src/soundscapy/databases/isd.py
load_zenodo
¶
Automatically fetch and load the ISD dataset from Zenodo.
| PARAMETER | DESCRIPTION |
|---|---|
version
|
Version number of the dataset to fetch, by default "latest".
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame containing ISD data. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the specified version is not recognized. |
Notes
This function fetches the ISD data directly from Zenodo, allowing access to different versions of the dataset.
Examples:
>>> from soundscapy.surveys.survey_utils import PAQ_IDS
>>> df = load_zenodo("v1.0.1")
>>> isinstance(df, pd.DataFrame)
True
>>> set(PAQ_IDS).issubset(df.columns)
True
Source code in src/soundscapy/databases/isd.py
validate
¶
validate(
df: DataFrame,
paq_aliases: list | dict = _PAQ_ALIASES,
val_range: tuple[int, int] = (1, 5),
*,
allow_paq_na: bool = False,
) -> tuple[pd.DataFrame, pd.DataFrame | None]
Perform data quality checks and validate that the dataset fits the expected format.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
ISD style dataframe, including PAQ data.
TYPE:
|
paq_aliases
|
List of PAQ names (in order) or dict of PAQ names with new names as values. |
allow_paq_na
|
If True, allow NaN values in PAQ data, by default False.
TYPE:
|
val_range
|
Min and max range of the PAQ response values, by default (1, 5). |
| RETURNS | DESCRIPTION |
|---|---|
tuple[DataFrame, DataFrame | None]
|
Tuple containing the cleaned dataframe and optionally a dataframe of excluded samples. |
Notes
This function renames PAQ columns, checks PAQ data quality, and optionally removes rows with invalid or missing PAQ values.
Examples:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame({
... 'PAQ1': [np.nan, 2, 3, 3], 'PAQ2': [3, 2, 6, 3], 'PAQ3': [2, 2, 3, 3],
... 'PAQ4': [1, 2, 3, 3], 'PAQ5': [5, 2, 3, 3], 'PAQ6': [3, 2, 3, 3],
... 'PAQ7': [4, 2, 3, 3], 'PAQ8': [2, 2, 3, 3]
... })
>>> clean_df, excl_df = validate(df, allow_paq_na=True)
>>> clean_df.shape[0]
2
>>> excl_df.shape[0]
2
Source code in src/soundscapy/databases/isd.py
match_col_to_likert_scale
¶
Match a column in the DataFrame to the Likert scale.
| PARAMETER | DESCRIPTION |
|---|---|
col
|
Column name to match.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Scale
|
Likert scale object. |
Source code in src/soundscapy/databases/isd.py
likert_categorical_from_data
¶
Get the Likert labels for a specific column in the DataFrame.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
Series containing the data.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Categorical
|
Series with Likert labels. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the column does not match any known Likert scale. |
Source code in src/soundscapy/databases/isd.py
select_record_ids
¶
Filter the dataframe by RecordID.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
ISD dataframe.
TYPE:
|
record_ids
|
RecordID(s) to filter by. |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered dataframe. |
Examples:
>>> df = pd.DataFrame({
... 'RecordID': ['A', 'B', 'C', 'D'],
... 'Value': [1, 2, 3, 4]
... })
>>> select_record_ids(df, ['A', 'C'])
RecordID Value
0 A 1
2 C 3
Source code in src/soundscapy/databases/isd.py
select_group_ids
¶
Filter the dataframe by GroupID.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
ISD dataframe.
TYPE:
|
group_ids
|
GroupID(s) to filter by. |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered dataframe. |
Examples:
>>> df = pd.DataFrame({
... 'GroupID': ['G1', 'G1', 'G2', 'G2'],
... 'Value': [1, 2, 3, 4]
... })
>>> select_group_ids(df, 'G1')
GroupID Value
0 G1 1
1 G1 2
Source code in src/soundscapy/databases/isd.py
select_session_ids
¶
Filter the dataframe by SessionID.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
ISD dataframe.
TYPE:
|
session_ids
|
SessionID(s) to filter by. |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered dataframe. |
Examples:
>>> df = pd.DataFrame({
... 'SessionID': ['S1', 'S1', 'S2', 'S2'],
... 'Value': [1, 2, 3, 4]
... })
>>> select_session_ids(df, ['S1', 'S2'])
SessionID Value
0 S1 1
1 S1 2
2 S2 3
3 S2 4
Source code in src/soundscapy/databases/isd.py
select_location_ids
¶
Filter the dataframe by LocationID.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
ISD dataframe.
TYPE:
|
location_ids
|
LocationID(s) to filter by. |
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Filtered dataframe. |
Examples:
>>> df = pd.DataFrame({
... 'LocationID': ['L1', 'L1', 'L2', 'L2'],
... 'Value': [1, 2, 3, 4]
... })
>>> select_location_ids(df, 'L2')
LocationID Value
2 L2 3
3 L2 4
Source code in src/soundscapy/databases/isd.py
describe_location
¶
describe_location(
data: DataFrame,
location: str,
calc_type: str = "percent",
pl_threshold: float = 0,
ev_threshold: float = 0,
) -> dict[str, int | float]
Return a summary of the data for a specific location.
| PARAMETER | DESCRIPTION |
|---|---|
data
|
ISD dataframe.
TYPE:
|
location
|
Location to describe.
TYPE:
|
calc_type
|
Type of summary, either "percent" or "count", by default "percent".
TYPE:
|
pl_threshold
|
Pleasantness threshold, by default 0.
TYPE:
|
ev_threshold
|
Eventfulness threshold, by default 0.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, int | float]
|
Summary of the data for the specified location. |
Examples:
>>> from soundscapy.surveys.processing import add_iso_coords
>>> df = pd.DataFrame({
... 'LocationID': ['L1', 'L1', 'L2', 'L2'],
... 'PAQ1': [4, 2, 3, 5],
... 'PAQ2': [3, 5, 2, 4],
... 'PAQ3': [2, 4, 1, 3],
... 'PAQ4': [1, 3, 4, 2],
... 'PAQ5': [5, 1, 5, 1],
... 'PAQ6': [4, 2, 3, 5],
... 'PAQ7': [3, 5, 2, 4],
... 'PAQ8': [2, 4, 1, 3],
... })
>>> df = add_iso_coords(df)
>>> result = describe_location(df, 'L1')
>>> set(result.keys()) == {
... 'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
... 'vibrant', 'chaotic', 'monotonous', 'calm'
... }
True
>>> result['count']
2
Source code in src/soundscapy/databases/isd.py
483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 | |
soundscapy_describe
¶
soundscapy_describe(
df: DataFrame,
group_by: str = "LocationID",
calc_type: str = "percent",
) -> pd.DataFrame
Return a summary of the data grouped by a specified column.
| PARAMETER | DESCRIPTION |
|---|---|
df
|
ISD dataframe.
TYPE:
|
group_by
|
Column to group by, by default "LocationID".
TYPE:
|
calc_type
|
Type of summary, either "percent" or "count", by default "percent".
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
Summary of the data. |
Examples:
>>> from soundscapy.surveys.processing import add_iso_coords
>>> df = pd.DataFrame({
... 'LocationID': ['L1', 'L1', 'L2', 'L2'],
... 'PAQ1': [4, 2, 3, 5],
... 'PAQ2': [3, 5, 2, 4],
... 'PAQ3': [2, 4, 1, 3],
... 'PAQ4': [1, 3, 4, 2],
... 'PAQ5': [5, 1, 5, 1],
... 'PAQ6': [4, 2, 3, 5],
... 'PAQ7': [3, 5, 2, 4],
... 'PAQ8': [2, 4, 1, 3],
... })
>>> df = add_iso_coords(df)
>>> result = soundscapy_describe(df)
>>> isinstance(result, pd.DataFrame)
True
>>> result.index.tolist()
['L1', 'L2']
>>> set(result.columns) == {
... 'count', 'ISOPleasant', 'ISOEventful', 'pleasant', 'eventful',
... 'vibrant', 'chaotic', 'monotonous', 'calm'
... }
True
>>> result = soundscapy_describe(df, calc_type="count")
>>> result.loc['L1', 'count']
2
Source code in src/soundscapy/databases/isd.py
ARAUS¶
soundscapy.databases.araus
¶
Customized functions specifically for the ARAUS dataset.
SATP database helpers¶
soundscapy.databases.satp
¶
Module for handling the Soundscape Attributes Translation Project (SATP) database.
This module provides functions for loading and processing data from the Soundscape Attributes Translation Project database. It includes utilities for data retrieval from Zenodo and basic data loading operations.
Examples:
>>> import soundscapy.databases.satp as satp
>>> df = satp.load_zenodo()
>>> isinstance(df, pd.DataFrame)
True
>>> 'Language' in df.columns
True
>>> satp.load_participants()
Traceback (most recent call last):
...
ValueError: Participant data is only available for SATP versions up to v1.2.1.
>>> participants = satp.load_participants(version="v1.2")
>>> isinstance(participants, pd.DataFrame)
True
>>> 'Age' in participants.columns
True
| CLASS | DESCRIPTION |
|---|---|
SATPVersion |
Versioned SATP dataset releases on Zenodo. |
| FUNCTION | DESCRIPTION |
|---|---|
load_zenodo |
Load the SATP dataset from Zenodo. |
load_participants |
Load the SATP participants dataset from Zenodo. |
SATPVersion
¶
Bases: Enum
Versioned SATP dataset releases on Zenodo.
Each member stores the canonical version string and its Zenodo download
URL. Version strings are normalised on lookup so "1.5", "v1.5",
and "V1.5" all resolve to the same member. The string "latest"
resolves to the first (newest) member.
Examples:
>>> SATPVersion("v1.2").url
'https://zenodo.org/record/7143599/files/SATP%20Dataset%20v1.2.xlsx'
>>> SATPVersion("1.2") is SATPVersion("V1.2")
True
>>> SATPVersion("latest") is SATPVersion.V1_5
True
>>> SATPVersion("invalid")
Traceback (most recent call last):
...
ValueError: 'invalid' is not a valid SATPVersion
| METHOD | DESCRIPTION |
|---|---|
__new__ |
Create a new member with a canonical version string and download URL. |
latest |
Return the most recent released version (first declared member). |
__lt__ |
Return True if this version is older than other. |
__str__ |
Return the canonical version string. |
load_zenodo
¶
Load the SATP dataset from Zenodo.
| PARAMETER | DESCRIPTION |
|---|---|
version
|
Version of the dataset to load. The default is "latest".
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame containing the SATP dataset. |
Source code in src/soundscapy/databases/satp.py
load_participants
¶
Load the SATP participants dataset from Zenodo.
| PARAMETER | DESCRIPTION |
|---|---|
version
|
Version of the dataset to load. The default is "latest".
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame containing the SATP participants dataset. |