The University of Southampton
University of Southampton Institutional Repository

Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency

Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency
Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency
This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker's utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity.
2329-9304
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Moore, Alastair H.
58d011fd-6a02-449a-9b77-651e8c86166e
Naylor, Patrick
8c20a1a0-4507-4a0f-8324-f3075354dc52
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Moore, Alastair H.
58d011fd-6a02-449a-9b77-651e8c86166e
Naylor, Patrick
8c20a1a0-4507-4a0f-8324-f3075354dc52

Hogg, Aidan, Evers, Christine, Moore, Alastair H. and Naylor, Patrick (2021) Overlapping speaker segmentation using multiple hypothesis tracking of fundamental frequency. IEEE/ACM Transactions on Audio, Speech, and Language Processing. (doi:10.1109/TASLP.2021.3067161).

Record type: Article

Abstract

This paper demonstrates how the harmonic structure of voiced speech can be exploited to segment multiple overlapping speakers in a speaker diarization task. We explore how a change in the speaker can be inferred from a change in pitch. We show that voiced harmonics can be useful in detecting when more than one speaker is talking, such as during overlapping speaker activity. A novel system is proposed to track multiple harmonics simultaneously, allowing for the determination of onsets and end-points of a speaker's utterance in the presence of an additional active speaker. This system is bench-marked against a segmentation system from the literature that employs a bidirectional long short term memory network (BLSTM) approach and requires training. Experimental results highlight that the proposed approach outperforms the BLSTM baseline approach by 12.9% in terms of HIT rate for speaker segmentation. We also show that the estimated pitch tracks of our system can be used as features to the BLSTM to achieve further improvements of 1.21% in terms of coverage and 2.45% in terms of purity.

Text
IEEE_Transactions_2020_Overlapping_speaker_segmentation_using_multiple_hypothesis_tracking_of_fundamental_frequency - Accepted Manuscript
Download (1MB)

More information

Accepted/In Press date: 7 March 2021
e-pub ahead of print date: 18 March 2021

Identifiers

Local EPrints ID: 448040
URI: http://eprints.soton.ac.uk/id/eprint/448040
ISSN: 2329-9304
PURE UUID: 5165152c-75df-49a8-a1a4-e9eb7d9d36d6
ORCID for Christine Evers: ORCID iD orcid.org/0000-0003-0757-5504

Catalogue record

Date deposited: 30 Mar 2021 16:37
Last modified: 13 Apr 2021 02:05

Export record

Altmetrics

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×