Working with Soundscape Survey Data

import matplotlib.pyplot as plt
import pandas as pd

import soundscapy as sspy
from soundscapy.databases import isd
from soundscapy.surveys import LANGUAGE_ANGLES, PAQ_IDS, rename_paqs

Introduction

This tutorial provides a comprehensive guide to working with soundscape survey data using the Soundscapy package. We’ll cover loading, validating, cleaning, and processing survey data, with a focus on the International Soundscape Database (ISD) format.

Learning Objectives

By the end of this tutorial, you will be able to: - Load and validate soundscape survey data - Perform data quality checks and handle missing values - Calculate ISO coordinates from PAQ ratings - Filter and group data by various criteria - Work with different survey formats and scales - Handle multi-language data using language-specific angles

Let’s begin by importing the necessary libraries:

1. Loading Soundscape Survey Data

Soundscapy provides several ways to load soundscape survey data. The most common approach is to use the built-in functions for loading the International Soundscape Database (ISD).

# Load the ISD dataset included with Soundscapy
data = isd.load()

# Display basic information about the dataset
print(f"Dataset shape: {data.shape}")
print(f"Number of locations: {data['LocationID'].nunique()}")
print(f"Number of records: {data['RecordID'].nunique()}")
print(f"PAQ columns: {PAQ_IDS}")

Loading from Zenodo

You can also load the ISD dataset directly from Zenodo, which allows you to access different versions of the dataset:

# Load a specific version of the ISD dataset from Zenodo
# Note: This requires an internet connection
# data_zenodo = isd.load_zenodo(version="v1.0.1")
# print(f"Zenodo dataset shape: {data_zenodo.shape}")

Loading Your Own Data

If you have your own soundscape survey data, you can load it using pandas and then process it with Soundscapy. The key is to ensure that your data includes the PAQ columns (PAQ1-PAQ8) or equivalent columns that can be renamed to match the expected format.

Example of loading your own data (commented out as it’s just an example)

# Load data from a CSV file
my_data = pd.read_csv('path/to/your/data.csv')

# If your PAQ columns have different names, rename them
paq_mapping = {
    'pleasant': 'PAQ1',
    'vibrant': 'PAQ2',
    'eventful': 'PAQ3',
    'chaotic': 'PAQ4',
    'annoying': 'PAQ5',
    'monotonous': 'PAQ6',
    'uneventful': 'PAQ7',
    'calm': 'PAQ8'
}
my_data = rename_paqs(my_data, paq_mapping)

2. Data Validation and Quality Checks

Before analyzing soundscape data, it’s important to validate it and check for data quality issues. Soundscapy provides functions for this purpose.

# Validate the dataset
valid_data, excluded_data = isd.validate(data)

print(f"Valid samples: {valid_data.shape[0]}")
print(f"Excluded samples: {0 if excluded_data is None else excluded_data.shape[0]}")

Understanding Data Validation

The validate function performs several checks on the PAQ data:

Missing Values: By default, it checks for missing values in the PAQ columns. You can allow missing values by setting allow_paq_na=True.
Value Range: It checks that all PAQ values are within the expected range (default is 1-5 for a 5-point Likert scale).
Constant Responses: It checks for responses where all PAQs have the same value (except for the midpoint value), which may indicate inattentive responding.

Let’s look at how to customize these checks:

# Custom validation with different parameters
custom_valid_data, custom_excluded_data = isd.validate(
    data,
    val_range=(1, 5),  # Specify the expected range of PAQ values
    allow_paq_na=False,  # Don't allow missing values in PAQs
)

print(f"Valid samples with custom validation: {custom_valid_data.shape[0]}")
print(
    f"Excluded samples with custom validation: {0 if custom_excluded_data is None else custom_excluded_data.shape[0]}"
)

Manual Data Quality Checks

You can also perform your own data quality checks using pandas functions:

# Check for missing values in PAQ columns
missing_counts = data[PAQ_IDS].isna().sum()
print("Missing values in PAQ columns:")
print(missing_counts)

# Check for values outside the expected range
out_of_range = ((data[PAQ_IDS] < 1) | (data[PAQ_IDS] > 5)).sum()
print("\nValues outside the expected range (1-5):")
print(out_of_range)

3. Calculating ISO Coordinates

Once you have validated your data, the next step is to calculate the ISO coordinates (ISOPleasant and ISOEventful) from the PAQ ratings. Soundscapy provides functions for this purpose.

# Calculate ISO coordinates
iso_pleasant, iso_eventful = sspy.surveys.calculate_iso_coords(valid_data[PAQ_IDS])

# Add the ISO coordinates to the dataframe
valid_data_with_iso = valid_data.copy()
valid_data_with_iso["ISOPleasant"] = iso_pleasant
valid_data_with_iso["ISOEventful"] = iso_eventful

# Display the first few rows
print("Data with ISO coordinates:")
print(valid_data_with_iso[["ISOPleasant", "ISOEventful"]].head())

# Alternatively, use the add_iso_coords function
valid_data_with_iso = sspy.surveys.add_iso_coords(valid_data)
print("\nUsing add_iso_coords function:")
print(valid_data_with_iso[["ISOPleasant", "ISOEventful"]].head())

Customizing ISO Coordinate Calculation

You can customize the ISO coordinate calculation by specifying the value range of the PAQ ratings and the angles of the PAQs in the circumplex model:

# Custom ISO coordinate calculation
custom_iso_data = sspy.surveys.add_iso_coords(
    valid_data,
    val_range=(1, 5),  # Specify the range of PAQ values
    names=("CustomPleasant", "CustomEventful"),  # Specify custom column names
    angles=(0, 45, 90, 135, 180, 225, 270, 315),  # Specify custom angles
    overwrite=True,  # Overwrite existing columns if they exist
)

# Compare the default and custom ISO coordinates
comparison_df = pd.DataFrame(
    {
        "ISOPleasant": valid_data_with_iso["ISOPleasant"],
        "CustomPleasant": custom_iso_data["CustomPleasant"],
        "ISOEventful": valid_data_with_iso["ISOEventful"],
        "CustomEventful": custom_iso_data["CustomEventful"],
    }
)

print("Comparison of default and custom ISO coordinates:")
print(comparison_df.head())

# Visualize the ISO coordinates
sspy.scatter(data=valid_data_with_iso, x="ISOPleasant", y="ISOEventful")
plt.title("ISO Coordinates")

4. Filtering and Grouping Data

Soundscapy provides several functions for filtering and grouping soundscape data based on various criteria.

# Filter by location
camden_data = isd.select_location_ids(valid_data_with_iso, "CamdenTown")
print(f"Camden Town samples: {camden_data.shape[0]}")

# Filter by multiple locations
multiple_locations = isd.select_location_ids(
    valid_data_with_iso, ["CamdenTown", "TateModern"]
)
print(f"Camden Town and Hyde Park samples: {multiple_locations.shape[0]}")

Complex Filtering with Pandas

You can also use pandas’ query function for more complex filtering:

# Filter by demographic variables
if "gen00" in valid_data_with_iso.columns and "age00" in valid_data_with_iso.columns:
    women_over_50 = valid_data_with_iso.query("gen00 == 'Female' and age00 > 50")
    print(f"Women over 50: {women_over_50.shape[0]}")

# Filter by ISO coordinates
pleasant_soundscapes = valid_data_with_iso.query("ISOPleasant > 0.5")
print(f"Pleasant soundscapes (ISOPleasant > 0.5): {pleasant_soundscapes.shape[0]}")

Grouping and Summarizing Data

You can use pandas’ groupby function to group and summarize soundscape data:

# Group by location and calculate mean ISO coordinates
location_means = valid_data_with_iso.groupby("LocationID")[
    ["ISOPleasant", "ISOEventful"]
].mean()
print("Mean ISO coordinates by location:")
print(location_means.head())

# Calculate standard deviations
location_stds = valid_data_with_iso.groupby("LocationID")[
    ["ISOPleasant", "ISOEventful"]
].std()
print("\nStandard deviations of ISO coordinates by location:")
location_stds.head()

Using the soundscapy_describe Function

Soundscapy provides a soundscapy_describe function that summarizes soundscape data by location:

# Describe locations
location_summary = isd.soundscapy_describe(valid_data_with_iso)
print("Soundscape summary by location:")
location_summary.head()

# Describe a specific location
camden_summary = isd.describe_location(valid_data_with_iso, "CamdenTown")
print("\nCamden Town summary:")
for key, value in camden_summary.items():
    print(f"{key}: {value}")

5. Working with Different Survey Formats and Scales

Soundscape surveys may use different scales for PAQ ratings. Soundscapy can handle different scales by specifying the value range when calculating ISO coordinates.

# Create a sample dataset with a different scale (0-100)
sample_data = pd.DataFrame(
    {
        "RecordID": ["EX1", "EX2", "EX3"],
        "PAQ1": [80, 60, 90],
        "PAQ2": [70, 50, 85],
        "PAQ3": [60, 40, 80],
        "PAQ4": [50, 30, 75],
        "PAQ5": [40, 20, 70],
        "PAQ6": [30, 10, 65],
        "PAQ7": [20, 0, 60],
        "PAQ8": [10, 5, 55],
    }
)

# Calculate ISO coordinates with a different scale
sample_data_with_iso = sspy.surveys.add_iso_coords(
    sample_data,
    val_range=(0, 100),  # Specify the range of PAQ values
    names=("ISOPleasant", "ISOEventful"),
    overwrite=True,
)

print("Sample data with ISO coordinates (0-100 scale):")
sample_data_with_iso[["ISOPleasant", "ISOEventful"]]

# Create a sample dataset with different column names
sample_data_2 = pd.DataFrame(
    {
        "RecordID": ["EX1", "EX2", "EX3"],
        "Pleasant": [4, 2, 5],
        "Vibrant": [3, 1, 4],
        "Eventful": [2, 3, 3],
        "Chaotic": [1, 4, 2],
        "Annoying": [2, 5, 1],
        "Monotonous": [3, 4, 2],
        "Uneventful": [4, 3, 3],
        "Calm": [5, 2, 4],
    }
)

# Rename columns to match PAQ format
paq_mapping = {
    "Pleasant": "PAQ1",
    "Vibrant": "PAQ2",
    "Eventful": "PAQ3",
    "Chaotic": "PAQ4",
    "Annoying": "PAQ5",
    "Monotonous": "PAQ6",
    "Uneventful": "PAQ7",
    "Calm": "PAQ8",
}
sample_data_2_renamed = rename_paqs(sample_data_2, paq_mapping)

# Calculate ISO coordinates
sample_data_2_with_iso = sspy.surveys.add_iso_coords(sample_data_2_renamed)

print("Sample data with renamed columns and ISO coordinates:")
sample_data_2_with_iso[["ISOPleasant", "ISOEventful"]]

6. Handling Multi-Language Data

Soundscape perception can vary across languages and cultures. Soundscapy provides language-specific angles for calculating ISO coordinates that account for these differences.

# Display the available language-specific angles
print("Available language-specific angles:")
for language, angles in LANGUAGE_ANGLES.items():
    print(f"{language}: {angles}")

Applying Language-Specific Angles

You can apply language-specific angles when calculating ISO coordinates:

# Calculate ISO coordinates with language-specific angles
english_angles = LANGUAGE_ANGLES["eng"]
spanish_angles = LANGUAGE_ANGLES["spa"]

# Using English angles
english_iso_data = sspy.surveys.add_iso_coords(
    valid_data,
    angles=english_angles,
    names=("English_Pleasant", "English_Eventful"),
    overwrite=True,
)

# Using Spanish angles
spanish_iso_data = sspy.surveys.add_iso_coords(
    valid_data,
    angles=spanish_angles,
    names=("Spanish_Pleasant", "Spanish_Eventful"),
    overwrite=True,
)

# Compare the results
comparison_df = pd.DataFrame(
    {
        "English_Pleasant": english_iso_data["English_Pleasant"],
        "Spanish_Pleasant": spanish_iso_data["Spanish_Pleasant"],
        "English_Eventful": english_iso_data["English_Eventful"],
        "Spanish_Eventful": spanish_iso_data["Spanish_Eventful"],
    }
)

print("Comparison of ISO coordinates with different language angles:")
comparison_df.head()

Applying Language-Specific Angles to a Mixed-Language Dataset

If your dataset contains responses in multiple languages, you can apply the appropriate angles for each language:

# Filter data for English and Spanish languages
if "Language" in valid_data_with_iso.columns:
    eng_data = valid_data_with_iso[valid_data_with_iso["Language"] == "eng"]
    spa_data = valid_data_with_iso[valid_data_with_iso["Language"] == "spa"]

    print(f"English responses: {eng_data.shape[0]}")
    print(f"Spanish responses: {spa_data.shape[0]}")

# Function to apply language-specific angles to each row
def apply_language_angles(row):
    if "Language" not in row or row["Language"] not in LANGUAGE_ANGLES:
        return sspy.surveys.calculate_iso_coords(
            row[PAQ_IDS], angles=LANGUAGE_ANGLES["eng"]
        )
    return sspy.surveys.calculate_iso_coords(
        row[PAQ_IDS], angles=LANGUAGE_ANGLES[row["Language"]]
    )

# Example of applying this function (commented out as it depends on the dataset structure)
# Apply language-specific angles to each row
if 'Language' in valid_data_with_iso.columns:
    # Apply the function to each row
    results = valid_data_with_iso.apply(apply_language_angles, axis=1)

    # Extract the results
    valid_data_with_iso['Lang_ISOPleasant'] = [r[0] for r in results]
    valid_data_with_iso['Lang_ISOEventful'] = [r[1] for r in results]

    print("ISO coordinates with language-specific angles:")
    print(valid_data_with_iso[['Lang_ISOPleasant', 'Lang_ISOEventful']].head())

7. Simulating Soundscape Data

Soundscapy provides a function for simulating soundscape data, which can be useful for testing and demonstration purposes.

# Simulate soundscape data
simulated_data = sspy.surveys.simulation(
    n=100,  # Number of samples
    val_range=(1, 5),  # Range of PAQ values
    incl_iso_coords=True,  # Include ISO coordinates
)

print("Simulated soundscape data:")
print(simulated_data.head())

# Visualize the simulated data
sspy.scatter(simulated_data, title="Simulated Soundscape Data")

8. Conclusion

In this tutorial, we’ve covered the essential aspects of working with soundscape survey data using the Soundscapy package:

Loading Data: We learned how to load data from the built-in ISD dataset and from external sources.
Data Validation: We explored how to validate and clean soundscape data to ensure quality.
ISO Coordinates: We calculated ISO coordinates from PAQ ratings and customized the calculation.
Filtering and Grouping: We filtered and grouped data based on various criteria.
Different Scales: We worked with different survey formats and scales.
Multi-Language Data: We applied language-specific angles to account for cultural differences.
Simulation: We simulated soundscape data for testing and demonstration.

These tools and techniques provide a solid foundation for analyzing soundscape survey data and can be applied to a wide range of research questions and datasets.