Theoretical and practical study of audio-visual person identification
Theoretical and practical study of audio-visual person identification
This thesis is concerned with combining the audio biometric (voice) and the visual biometric (face) for person identification. To achieve this goal, a speaker identification classifier and a face identification classifier are built and tested on the XM2VTS database. In this thesis, we provide both theoretical and practical research work on combining these two classifiers for the purpose of achieving better identification results. Experiments indicate that our approach achieves very high identification rate on the XM2VTS database.
The main contributions of this thesis lie in three parts: first, we have proposed a new algorithm to adjust weighting parameter(s) for combining independent audio and visual signals; second, we have theoretically proved that there is no ‘perfect’ fusion algorithm suitable for all situations (the ‘no panacea’ theorem); third, we have built an audio-visual person identification system and achieved good performance on the XM2VTS database.
There are several directions for our future research work, which includes: (1) developing combination algorithms which are robust to noise and unpredictable situations; (2) combining visual features with the audio-visual classifier; (3) research work on face recognition; (4) generalising the method of finding optimal weighting parameter to the person verification cases; (5) theoretical study on multiple classifier combination; (6) building a real-time audio-visual person recognition system.
University of Southampton
Hu, Haoji
6b95a44c-e208-4247-acad-f4382ea41c10
2007
Hu, Haoji
6b95a44c-e208-4247-acad-f4382ea41c10
Hu, Haoji
(2007)
Theoretical and practical study of audio-visual person identification.
University of Southampton, Doctoral Thesis.
Record type:
Thesis
(Doctoral)
Abstract
This thesis is concerned with combining the audio biometric (voice) and the visual biometric (face) for person identification. To achieve this goal, a speaker identification classifier and a face identification classifier are built and tested on the XM2VTS database. In this thesis, we provide both theoretical and practical research work on combining these two classifiers for the purpose of achieving better identification results. Experiments indicate that our approach achieves very high identification rate on the XM2VTS database.
The main contributions of this thesis lie in three parts: first, we have proposed a new algorithm to adjust weighting parameter(s) for combining independent audio and visual signals; second, we have theoretically proved that there is no ‘perfect’ fusion algorithm suitable for all situations (the ‘no panacea’ theorem); third, we have built an audio-visual person identification system and achieved good performance on the XM2VTS database.
There are several directions for our future research work, which includes: (1) developing combination algorithms which are robust to noise and unpredictable situations; (2) combining visual features with the audio-visual classifier; (3) research work on face recognition; (4) generalising the method of finding optimal weighting parameter to the person verification cases; (5) theoretical study on multiple classifier combination; (6) building a real-time audio-visual person recognition system.
Text
1119317.pdf
- Version of Record
More information
Published date: 2007
Identifiers
Local EPrints ID: 466335
URI: http://eprints.soton.ac.uk/id/eprint/466335
PURE UUID: ee068716-c0c3-41d9-8fe6-8167bea62173
Catalogue record
Date deposited: 05 Jul 2022 05:11
Last modified: 16 Mar 2024 20:38
Export record
Contributors
Author:
Haoji Hu
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics