Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency
Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency
This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker's utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity.
Harmonic analysis, Hidden Markov models, Kalman filter, Kalman filters, Microphones, Reliability, Speech processing, Task analysis, pitch tracking, speaker segmentation
1479-1490
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Moore, Alastair H.
58d011fd-6a02-449a-9b77-651e8c86166e
Naylor, Patrick
8c20a1a0-4507-4a0f-8324-f3075354dc52
18 March 2021
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Moore, Alastair H.
58d011fd-6a02-449a-9b77-651e8c86166e
Naylor, Patrick
8c20a1a0-4507-4a0f-8324-f3075354dc52
Hogg, Aidan, Evers, Christine, Moore, Alastair H. and Naylor, Patrick
(2021)
Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, , [9381673].
(doi:10.1109/TASLP.2021.3067161).
Abstract
This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker's utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity.
Text
IEEE_Transactions_2020_Overlapping_speaker_segmentation_using_multiple_hypothesis_tracking_of_fundamental_frequency
- Accepted Manuscript
More information
Accepted/In Press date: 7 March 2021
e-pub ahead of print date: 18 March 2021
Published date: 18 March 2021
Keywords:
Harmonic analysis, Hidden Markov models, Kalman filter, Kalman filters, Microphones, Reliability, Speech processing, Task analysis, pitch tracking, speaker segmentation
Identifiers
Local EPrints ID: 448040
URI: http://eprints.soton.ac.uk/id/eprint/448040
ISSN: 2329-9304
PURE UUID: 5165152c-75df-49a8-a1a4-e9eb7d9d36d6
Catalogue record
Date deposited: 30 Mar 2021 16:37
Last modified: 17 Mar 2024 04:01
Export record
Altmetrics
Contributors
Author:
Aidan Hogg
Author:
Christine Evers
Author:
Alastair H. Moore
Author:
Patrick Naylor
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics