The University of Southampton
University of Southampton Institutional Repository

Speaker change detection using fundamental frequency with application to multi-talker segmentation

Speaker change detection using fundamental frequency with application to multi-talker segmentation
Speaker change detection using fundamental frequency with application to multi-talker segmentation
This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual's pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This system is then evaluated against a commonly used MFCC segmentation system. The proposed system is shown to increase the speaker change detection rate from 43.3% to 70.5% on meetings in the AMI corpus. Therefore, there are two equally weighted contributions in this paper: 1. We address the question of whether a change in pitch is a reliable estimator of a speaker change in multi-talk meeting audio. 2. We develop a method to extract such speaker changes and test them on a widely available meeting corpus.
IEEE
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Naylor, Patrick A.
13079486-664a-414c-a1a2-01a30bf0997b
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Naylor, Patrick A.
13079486-664a-414c-a1a2-01a30bf0997b

Hogg, Aidan, Evers, Christine and Naylor, Patrick A. (2019) Speaker change detection using fundamental frequency with application to multi-talker segmentation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.. (doi:10.1109/ICASSP.2019.8682924).

Record type: Conference or Workshop Item (Paper)

Abstract

This paper shows that time varying pitch properties can be used advantageously within the segmentation step of a multi-talker diarization system. First a study is conducted to verify that changes in pitch are strong indicators of changes in the speaker. It is then highlighted that an individual's pitch is smoothly varying and, therefore, can be predicted by means of a Kalman filter. Subsequently it is shown that if the pitch is not predictable then this is most likely due to a change in the speaker. Finally, a novel system is proposed that uses this approach of pitch prediction for speaker change detection. This system is then evaluated against a commonly used MFCC segmentation system. The proposed system is shown to increase the speaker change detection rate from 43.3% to 70.5% on meetings in the AMI corpus. Therefore, there are two equally weighted contributions in this paper: 1. We address the question of whether a change in pitch is a reliable estimator of a speaker change in multi-talk meeting audio. 2. We develop a method to extract such speaker changes and test them on a widely available meeting corpus.

This record has no associated files available for download.

More information

Published date: 17 April 2019

Identifiers

Local EPrints ID: 439803
URI: http://eprints.soton.ac.uk/id/eprint/439803
PURE UUID: 4d0fa01d-ad80-4930-8990-1469f46c4d92
ORCID for Christine Evers: ORCID iD orcid.org/0000-0003-0757-5504

Catalogue record

Date deposited: 05 May 2020 16:30
Last modified: 17 Mar 2024 04:01

Export record

Altmetrics

Contributors

Author: Aidan Hogg
Author: Christine Evers ORCID iD
Author: Patrick A. Naylor

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×