The University of Southampton
University of Southampton Institutional Repository

Multiple hypothesis tracking for overlapping speaker segmentation

Multiple hypothesis tracking for overlapping speaker segmentation
Multiple hypothesis tracking for overlapping speaker segmentation
Speaker segmentation is an essential part of any diarization system. Applications of diarization include tasks such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker environments. This paper proposes a multiple hypothesis tracking (MHT) method that exploits the harmonic structure associated with the pitch in voiced speech in order to segment the onsets and end-points of speech from multiple, overlapping speakers. The proposed method is evaluated against a segmentation system from the literature that uses a spectral representation and is based on employing bidirectional long short term memory networks (BLSTM). The proposed method is shown to achieve comparable performance for segmenting overlapping speakers only using the pitch harmonic information in the MHT framework.
IEEE
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Naylor, Patrick A.
13079486-664a-414c-a1a2-01a30bf0997b
Hogg, Aidan
e2c97ca1-9ec2-4da1-9fd3-5feea6142756
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Naylor, Patrick A.
13079486-664a-414c-a1a2-01a30bf0997b

Hogg, Aidan, Evers, Christine and Naylor, Patrick A. (2019) Multiple hypothesis tracking for overlapping speaker segmentation. In Proceedings IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA). IEEE.. (doi:10.1109/WASPAA.2019.8937185).

Record type: Conference or Workshop Item (Paper)

Abstract

Speaker segmentation is an essential part of any diarization system. Applications of diarization include tasks such as speaker indexing, improving automatic speech recognition (ASR) performance and making single speaker-based algorithms available for use in multi-speaker environments. This paper proposes a multiple hypothesis tracking (MHT) method that exploits the harmonic structure associated with the pitch in voiced speech in order to segment the onsets and end-points of speech from multiple, overlapping speakers. The proposed method is evaluated against a segmentation system from the literature that uses a spectral representation and is based on employing bidirectional long short term memory networks (BLSTM). The proposed method is shown to achieve comparable performance for segmenting overlapping speakers only using the pitch harmonic information in the MHT framework.

This record has no associated files available for download.

More information

Published date: 23 December 2019

Identifiers

Local EPrints ID: 439390
URI: http://eprints.soton.ac.uk/id/eprint/439390
PURE UUID: 8ec1af6a-07f9-4e73-9f6f-6aa7a45ebcf2
ORCID for Christine Evers: ORCID iD orcid.org/0000-0003-0757-5504

Catalogue record

Date deposited: 21 Apr 2020 16:30
Last modified: 17 Mar 2024 04:01

Export record

Altmetrics

Contributors

Author: Aidan Hogg
Author: Christine Evers ORCID iD
Author: Patrick A. Naylor

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×