On acoustic emotion recognition: compensating for covariate shift
On acoustic emotion recognition: compensating for covariate shift
Pattern recognition tasks often face the situation that training data are not fully representative of test data. This problem is well-recognized in speech recognition, where methods like cepstral mean normalization (CMN), vocal tract length normalization (VTLN) and maximum likelihood linear regression (MLLR) are used to compensate for channel and speaker differences. Speech emotion recognition (SER) is an important emerging field in human-computer interaction and faces the same data shift problems, a fact which has been generally overlooked in this domain. In this paper, we show that compensating for channel and speaker differences can give significant improvements in SER by modelling these differences as a covariate shift. We employ three algorithms from the domain of transfer learning that apply importance weights (IWs) within a support vector machine classifier to reduce the effects of covariate shift. We test these methods on the FAU Aibo Emotion Corpus, which was used in the Interspeech 2009 Emotion Challenge. It consists of two separate parts recorded independently at different schools; hence the two parts exhibit covariate shift. Results show that the IW methods outperform combined CMN and VTLN and significantly improve on the baseline performance of the Challenge. The best of the three methods also improves significantly on the winning contribution to the Challenge.
1458-1468
Hassan, A.
f16d7813-136b-414a-88cc-46c38cddff45
Damper, R.I.
6e0e7fdc-57ec-44d4-bc0f-029d17ba441d
Niranjan, M.
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
July 2013
Hassan, A.
f16d7813-136b-414a-88cc-46c38cddff45
Damper, R.I.
6e0e7fdc-57ec-44d4-bc0f-029d17ba441d
Niranjan, M.
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Hassan, A., Damper, R.I. and Niranjan, M.
(2013)
On acoustic emotion recognition: compensating for covariate shift.
IEEE Transactions on Audio, Speech and Language Processing, 21 (7), .
(doi:10.1109/TASL.2013.2255278).
Abstract
Pattern recognition tasks often face the situation that training data are not fully representative of test data. This problem is well-recognized in speech recognition, where methods like cepstral mean normalization (CMN), vocal tract length normalization (VTLN) and maximum likelihood linear regression (MLLR) are used to compensate for channel and speaker differences. Speech emotion recognition (SER) is an important emerging field in human-computer interaction and faces the same data shift problems, a fact which has been generally overlooked in this domain. In this paper, we show that compensating for channel and speaker differences can give significant improvements in SER by modelling these differences as a covariate shift. We employ three algorithms from the domain of transfer learning that apply importance weights (IWs) within a support vector machine classifier to reduce the effects of covariate shift. We test these methods on the FAU Aibo Emotion Corpus, which was used in the Interspeech 2009 Emotion Challenge. It consists of two separate parts recorded independently at different schools; hence the two parts exhibit covariate shift. Results show that the IW methods outperform combined CMN and VTLN and significantly improve on the baseline performance of the Challenge. The best of the three methods also improves significantly on the winning contribution to the Challenge.
This record has no associated files available for download.
More information
e-pub ahead of print date: 27 March 2013
Published date: July 2013
Organisations:
Southampton Wireless Group
Identifiers
Local EPrints ID: 350948
URI: http://eprints.soton.ac.uk/id/eprint/350948
ISSN: 1558-7916
PURE UUID: 3fa3b776-6e18-4a3c-8b98-e194157b06ea
Catalogue record
Date deposited: 11 Apr 2013 09:57
Last modified: 15 Mar 2024 03:29
Export record
Altmetrics
Contributors
Author:
A. Hassan
Author:
R.I. Damper
Author:
M. Niranjan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics