The University of Southampton
University of Southampton Institutional Repository

Dataset in support of the thesis 'Speech enhancement by using deep learning algorithms'

Dataset in support of the thesis 'Speech enhancement by using deep learning algorithms'
Dataset in support of the thesis 'Speech enhancement by using deep learning algorithms'
The source code and audio datasets of my PhD project. 1. https://www.openslr.org/12 LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned. Acoustic models, trained on this data set, are available at kaldi-asr.org and language models, suitable for evaluation can be found at http://www.openslr.org/11/. For more information, see the paper "LibriSpeech: an ASR corpus based on public domain audio books", Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, ICASSP 2015 2.https://www.openslr.org/17 MUSAN is a corpus of music, speech, and noise recordings. This work was supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1232825 and by Spoken Communications. You can cite the data using the following BibTeX entry: @misc{musan2015, author = {David Snyder and Guoguo Chen and Daniel Povey}, title = {{MUSAN}: {A} {M}usic, {S}peech, and {N}oise {C}orpus}, year = {2015}, eprint = {1510.08484}, note = {arXiv:1510.08484v1} } 3. source_code.zip The program from parts of my PhD project. 4.SJ_EXP.zip The program of the subjective experiment corresponding to the last chapter.
speech enhancement
University of Southampton
Cui, Jianqiao
3961d0d6-9687-4fbc-9e17-93be8bd86a36
Bleeck, Stefan
c888ccba-e64c-47bf-b8fa-a687e87ec16c
Panayotov, Vassil
6e0f90aa-1e50-461f-ad40-fc7368ec4887
Chen, Guoguo
b3658283-69ac-4a07-aa06-0937ba4fd346
Povey, Daniel
a3d1e3b4-7973-4ece-89c9-95378da60e81
Khudanpur, Sanjeev
6c25d25e-ecfe-45d4-a1a7-bede5fca36d1
Snyder, David
cf6e2418-fa76-4ad3-aa80-960accb6ec67
Cui, Jianqiao
3961d0d6-9687-4fbc-9e17-93be8bd86a36
Bleeck, Stefan
c888ccba-e64c-47bf-b8fa-a687e87ec16c
Panayotov, Vassil
6e0f90aa-1e50-461f-ad40-fc7368ec4887
Chen, Guoguo
b3658283-69ac-4a07-aa06-0937ba4fd346
Povey, Daniel
a3d1e3b4-7973-4ece-89c9-95378da60e81
Khudanpur, Sanjeev
6c25d25e-ecfe-45d4-a1a7-bede5fca36d1
Snyder, David
cf6e2418-fa76-4ad3-aa80-960accb6ec67

Cui, Jianqiao, Panayotov, Vassil, Chen, Guoguo, Povey, Daniel, Khudanpur, Sanjeev and Snyder, David (2024) Dataset in support of the thesis 'Speech enhancement by using deep learning algorithms'. University of Southampton doi:10.5258/SOTON/D3161 [Dataset]

Record type: Dataset

Abstract

The source code and audio datasets of my PhD project. 1. https://www.openslr.org/12 LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned. Acoustic models, trained on this data set, are available at kaldi-asr.org and language models, suitable for evaluation can be found at http://www.openslr.org/11/. For more information, see the paper "LibriSpeech: an ASR corpus based on public domain audio books", Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, ICASSP 2015 2.https://www.openslr.org/17 MUSAN is a corpus of music, speech, and noise recordings. This work was supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1232825 and by Spoken Communications. You can cite the data using the following BibTeX entry: @misc{musan2015, author = {David Snyder and Guoguo Chen and Daniel Povey}, title = {{MUSAN}: {A} {M}usic, {S}peech, and {N}oise {C}orpus}, year = {2015}, eprint = {1510.08484}, note = {arXiv:1510.08484v1} } 3. source_code.zip The program from parts of my PhD project. 4.SJ_EXP.zip The program of the subjective experiment corresponding to the last chapter.

Text
thesis_readme_Cui.txt - Dataset
Download (2kB)
Archive
source_code.zip - Model
Available under License All Rights Reserved.
Download (36kB)
Archive
SJ_EXP.zip - Model
Available under License All Rights Reserved.
Download (263MB)

More information

Published date: 2024
Keywords: speech enhancement

Identifiers

Local EPrints ID: 492128
URI: http://eprints.soton.ac.uk/id/eprint/492128
PURE UUID: 4fbd2d4f-7fc8-42ea-bcfc-ca08cdca05aa
ORCID for Jianqiao Cui: ORCID iD orcid.org/0000-0002-6016-5574
ORCID for Stefan Bleeck: ORCID iD orcid.org/0000-0003-4378-3394

Catalogue record

Date deposited: 17 Jul 2024 16:45
Last modified: 19 Jul 2024 01:58

Export record

Altmetrics

Contributors

Creator: Jianqiao Cui ORCID iD
Contributor: Stefan Bleeck ORCID iD
Creator: Vassil Panayotov
Creator: Guoguo Chen
Creator: Daniel Povey
Creator: Sanjeev Khudanpur
Creator: David Snyder

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×