HRTF-based data augmentation method for acoustic scene classification
HRTF-based data augmentation method for acoustic scene classification
In acoustic scene classification (ASC), a technical problem yet to be solved is raised by the variety of recording devices. The amount of data recorded by different devices is usually unbalanced. The model trained with audio data collected by one device is hardly transferred to another device. Therefore, in order for the cross-device performance to be improved, this paper proposes a data augmentation method for ASC systems that take monaural audio samples as input, whereby the head-related transfer functions (HRTFs) are adopted to add artificial spatial information to monaural audio samples. The proposed method enables ASC systems to imitate the ability of human binaural hearing to distinguish spatial orientation and lock specific sound sources. The experiment results show that with the proposed method, the VGGNet and ResNet systems can get 13.4% and 14.4% higher accuracy than the DCASE 2020 baseline in the cross-device ASC, respectively.
Liu, Yingzi
2f329d69-3fc5-4271-9426-f623cc28f76c
Yang, Haocong
b222b76e-59c5-40cc-9e0c-52fea538f7ee
Shi, Chuang
c46f72bd-54c7-45ee-ac5d-285691fccf81
Liang, Jiangnan
a42f52fd-6d3f-4466-85a4-9a97f17aba4d
Liu, Yingzi
2f329d69-3fc5-4271-9426-f623cc28f76c
Yang, Haocong
b222b76e-59c5-40cc-9e0c-52fea538f7ee
Shi, Chuang
c46f72bd-54c7-45ee-ac5d-285691fccf81
Liang, Jiangnan
a42f52fd-6d3f-4466-85a4-9a97f17aba4d
Liu, Yingzi, Yang, Haocong, Shi, Chuang and Liang, Jiangnan
(2021)
HRTF-based data augmentation method for acoustic scene classification.
In 2021 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB).
IEEE.
5 pp
.
(doi:10.1109/BMSB53066.2021.9547082).
Record type:
Conference or Workshop Item
(Paper)
Abstract
In acoustic scene classification (ASC), a technical problem yet to be solved is raised by the variety of recording devices. The amount of data recorded by different devices is usually unbalanced. The model trained with audio data collected by one device is hardly transferred to another device. Therefore, in order for the cross-device performance to be improved, this paper proposes a data augmentation method for ASC systems that take monaural audio samples as input, whereby the head-related transfer functions (HRTFs) are adopted to add artificial spatial information to monaural audio samples. The proposed method enables ASC systems to imitate the ability of human binaural hearing to distinguish spatial orientation and lock specific sound sources. The experiment results show that with the proposed method, the VGGNet and ResNet systems can get 13.4% and 14.4% higher accuracy than the DCASE 2020 baseline in the cross-device ASC, respectively.
Text
BMSB2021_Liu_Submission
- Accepted Manuscript
More information
e-pub ahead of print date: 1 October 2021
Venue - Dates:
2021 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), , Chengdu, China, 2021-08-04 - 2021-08-06
Identifiers
Local EPrints ID: 483661
URI: http://eprints.soton.ac.uk/id/eprint/483661
PURE UUID: 021701fc-8a74-4297-9ffd-d409dd6b1b43
Catalogue record
Date deposited: 03 Nov 2023 17:34
Last modified: 18 Mar 2024 04:13
Export record
Altmetrics
Contributors
Author:
Yingzi Liu
Author:
Haocong Yang
Author:
Chuang Shi
Author:
Jiangnan Liang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics