Dataset in support of the thesis 'Speech enhancement by using deep learning algorithms'
Dataset in support of the thesis 'Speech enhancement by using deep learning algorithms'
The source code and audio datasets of my PhD project.
1. https://www.openslr.org/12
LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.
Acoustic models, trained on this data set, are available at kaldi-asr.org and language models, suitable for evaluation can be found at http://www.openslr.org/11/.
For more information, see the paper "LibriSpeech: an ASR corpus based on public domain audio books", Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, ICASSP 2015
2.https://www.openslr.org/17
MUSAN is a corpus of music, speech, and noise recordings.
This work was supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1232825 and by Spoken Communications.
You can cite the data using the following BibTeX entry:
@misc{musan2015,
author = {David Snyder and Guoguo Chen and Daniel Povey},
title = {{MUSAN}: {A} {M}usic, {S}peech, and {N}oise {C}orpus},
year = {2015},
eprint = {1510.08484},
note = {arXiv:1510.08484v1}
}
3. source_code.zip
The program from parts of my PhD project.
4.SJ_EXP.zip
The program of the subjective experiment corresponding to the last chapter.
speech enhancement
University of Southampton
Cui, Jianqiao
3961d0d6-9687-4fbc-9e17-93be8bd86a36
Bleeck, Stefan
c888ccba-e64c-47bf-b8fa-a687e87ec16c
Panayotov, Vassil
6e0f90aa-1e50-461f-ad40-fc7368ec4887
Chen, Guoguo
b3658283-69ac-4a07-aa06-0937ba4fd346
Povey, Daniel
a3d1e3b4-7973-4ece-89c9-95378da60e81
Khudanpur, Sanjeev
6c25d25e-ecfe-45d4-a1a7-bede5fca36d1
Snyder, David
cf6e2418-fa76-4ad3-aa80-960accb6ec67
Cui, Jianqiao
3961d0d6-9687-4fbc-9e17-93be8bd86a36
Bleeck, Stefan
c888ccba-e64c-47bf-b8fa-a687e87ec16c
Panayotov, Vassil
6e0f90aa-1e50-461f-ad40-fc7368ec4887
Chen, Guoguo
b3658283-69ac-4a07-aa06-0937ba4fd346
Povey, Daniel
a3d1e3b4-7973-4ece-89c9-95378da60e81
Khudanpur, Sanjeev
6c25d25e-ecfe-45d4-a1a7-bede5fca36d1
Snyder, David
cf6e2418-fa76-4ad3-aa80-960accb6ec67
Cui, Jianqiao, Panayotov, Vassil, Chen, Guoguo, Povey, Daniel, Khudanpur, Sanjeev and Snyder, David
(2024)
Dataset in support of the thesis 'Speech enhancement by using deep learning algorithms'.
University of Southampton
doi:10.5258/SOTON/D3161
[Dataset]
Abstract
The source code and audio datasets of my PhD project.
1. https://www.openslr.org/12
LibriSpeech is a corpus of approximately 1000 hours of 16kHz read English speech, prepared by Vassil Panayotov with the assistance of Daniel Povey. The data is derived from read audiobooks from the LibriVox project, and has been carefully segmented and aligned.
Acoustic models, trained on this data set, are available at kaldi-asr.org and language models, suitable for evaluation can be found at http://www.openslr.org/11/.
For more information, see the paper "LibriSpeech: an ASR corpus based on public domain audio books", Vassil Panayotov, Guoguo Chen, Daniel Povey and Sanjeev Khudanpur, ICASSP 2015
2.https://www.openslr.org/17
MUSAN is a corpus of music, speech, and noise recordings.
This work was supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1232825 and by Spoken Communications.
You can cite the data using the following BibTeX entry:
@misc{musan2015,
author = {David Snyder and Guoguo Chen and Daniel Povey},
title = {{MUSAN}: {A} {M}usic, {S}peech, and {N}oise {C}orpus},
year = {2015},
eprint = {1510.08484},
note = {arXiv:1510.08484v1}
}
3. source_code.zip
The program from parts of my PhD project.
4.SJ_EXP.zip
The program of the subjective experiment corresponding to the last chapter.
Text
thesis_readme_Cui.txt
- Dataset
Archive
source_code.zip
- Model
Available under License All Rights Reserved.
Archive
SJ_EXP.zip
- Model
Available under License All Rights Reserved.
More information
Published date: 2024
Keywords:
speech enhancement
Identifiers
Local EPrints ID: 492128
URI: http://eprints.soton.ac.uk/id/eprint/492128
PURE UUID: 4fbd2d4f-7fc8-42ea-bcfc-ca08cdca05aa
Catalogue record
Date deposited: 17 Jul 2024 16:45
Last modified: 19 Jul 2024 01:58
Export record
Altmetrics
Contributors
Creator:
Jianqiao Cui
Creator:
Vassil Panayotov
Creator:
Guoguo Chen
Creator:
Daniel Povey
Creator:
Sanjeev Khudanpur
Creator:
David Snyder
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics