STELIN-US: A Spatio-Temporally Linked Neighborhood Urban Sound Database｜BIIC Lab - NTHU

Other: Computation Methods for Health

STELIN-US: A Spatio-Temporally Linked Neighborhood Urban Sound Database

ZENODO

Abstract

Automated acoustic understanding, e.g., sound event detection and acoustic scene recognition, is an important research direction enabling numerous modern technologies. Although there is a wealth of corpora, most, if not all, include acoustic samples of scenes/events in isolation without considering their interconnectivity with locations nearby in a neighborhood. Within a connected neighborhood, the temporal continuity and regional limitation (sound-location dependency) at distinct locations creates non-iid acoustics samples at each site across spatial-temporal dimensions. To our best knowledge, none of the previous data sources takes on this particular angle. In this work, we present a novel dataset, the Spatio-temporally Linked Neighborhood Urban Sound (STeLiN-US) database. The dataset is semi-synthesized, that is, each sample is generated by leveraging diverse sets of real urban sounds with crawled information of real-world user behaviors over time. This method helps create a realistic large-scale dataset, and we further evaluate it through perceptual listening tests. This neighborhood-based data generation opens up novel opportunities to advance user-centered applications with automated acoustic understanding. For example, to develop real-world technology to model a user’s speech data over a day, one can imagine utilizing this dataset as the user’s speech samples would modulate by diverse sources of acoustics surrounding linked across sites and temporally by natural behavior dynamics at each location over time.

Figures

Acoustic Synthesis Map

Keywords

Audio Dataset ｜ Sound Synthesis ｜ Urban Sound ｜ Connected

Authors

Conference

The 8th Workshop on Detection and Classification of Acoustic Scenes and Events.

Publisher

DCASE

RESEARCH

Related Research