RESEARCH

HOME RESEARCH
Multimedia Modeling
Other: Media Processing for Interpretation
Other: Computation Methods for Health
STELIN-US: A Spatio-Temporally Linked Neighborhood Urban Sound Database
Abstract
Automated acoustic understanding, e.g., sound event detection and acoustic scene recognition, is an important research direction enabling numerous modern technologies. Although there is a wealth of corpora, most, if not all, include acoustic samples of scenes/events in isolation without considering their interconnectivity with locations nearby in a neighborhood. Within a connected neighborhood, the temporal continuity and regional limitation (sound-location dependency) at distinct locations creates non-iid acoustics samples at each site across spatial-temporal dimensions. To our best knowledge, none of the previous data sources takes on this particular angle. In this work, we present a novel dataset, the Spatio-temporally Linked Neighborhood Urban Sound (STeLiN-US) database. The dataset is semi-synthesized, that is, each sample is generated by leveraging diverse sets of real urban sounds with crawled information of real-world user behaviors over time. This method helps create a realistic large-scale dataset, and we further evaluate it through perceptual listening tests. This neighborhood-based data generation opens up novel opportunities to advance user-centered applications with automated acoustic understanding. For example, to develop real-world technology to model a user’s speech data over a day, one can imagine utilizing this dataset as the user’s speech samples would modulate by diverse sources of acoustics surrounding linked across sites and temporally by natural behavior dynamics at each location over time.
Figures
Acoustic Synthesis Map
Acoustic Synthesis Map
Keywords
Audio Dataset | Sound Synthesis | Urban Sound | Connected
Authors
Bo-Hao Su Chi-Chun Lee
Conference
DCASE 2023
The 8th Workshop on Detection and Classification of Acoustic Scenes and Events.
Publisher
DCASE
DCASE