Speaker change detection using fundamental frequency with application to multi-talker segmentation
Speaker change detection using fundamental frequency with application to multi-talker segmentation
This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual's pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This system is then evaluated against a commonly used MFCC segmentation system. The proposed system is shown to increase the speaker change detection rate from 43.3% to 70.5% on meetings in the AMI corpus. Therefore, there are two equally weighted contributions in this paper: 1. We address the question of whether a change in pitch is a reliable estimator of a speaker change in multi-talk meeting audio. 2. We develop a method to extract such speaker changes and test them on a widely available meeting corpus.
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Naylor, Patrick A.
13079486-664a-414c-a1a2-01a30bf0997b
17 April 2019
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Naylor, Patrick A.
13079486-664a-414c-a1a2-01a30bf0997b
Hogg, Aidan, Evers, Christine and Naylor, Patrick A.
(2019)
Speaker change detection using fundamental frequency with application to multi-talker segmentation.
In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE..
(doi:10.1109/ICASSP.2019.8682924).
Record type:
Conference or Workshop Item
(Paper)
Abstract
This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual's pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This system is then evaluated against a commonly used MFCC segmentation system. The proposed system is shown to increase the speaker change detection rate from 43.3% to 70.5% on meetings in the AMI corpus. Therefore, there are two equally weighted contributions in this paper: 1. We address the question of whether a change in pitch is a reliable estimator of a speaker change in multi-talk meeting audio. 2. We develop a method to extract such speaker changes and test them on a widely available meeting corpus.
This record has no associated files available for download.
More information
Published date: 17 April 2019
Identifiers
Local EPrints ID: 439803
URI: http://eprints.soton.ac.uk/id/eprint/439803
PURE UUID: 4d0fa01d-ad80-4930-8990-1469f46c4d92
Catalogue record
Date deposited: 05 May 2020 16:30
Last modified: 17 Mar 2024 04:01
Export record
Altmetrics
Contributors
Author:
Aidan Hogg
Author:
Christine Evers
Author:
Patrick A. Naylor
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics