Theoretical and practical study of audio-visual person identification

Hu, Haoji (2007) Theoretical and practical study of audio-visual person identification. University of Southampton, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

This thesis is concerned with combining the audio biometric (voice) and the visual biometric (face) for person identification. To achieve this goal, a speaker identification classifier and a face identification classifier are built and tested on the XM2VTS database. In this thesis, we provide both theoretical and practical research work on combining these two classifiers for the purpose of achieving better identification results. Experiments indicate that our approach achieves very high identification rate on the XM2VTS database.

The main contributions of this thesis lie in three parts: first, we have proposed a new algorithm to adjust weighting parameter(s) for combining independent audio and visual signals; second, we have theoretically proved that there is no ‘perfect’ fusion algorithm suitable for all situations (the ^‘no panacea’ theorem); third, we have built an audio-visual person identification system and achieved good performance on the XM2VTS database.

There are several directions for our future research work, which includes: (1) developing combination algorithms which are robust to noise and unpredictable situations; (2) combining visual features with the audio-visual classifier; (3) research work on face recognition; (4) generalising the method of finding optimal weighting parameter to the person verification cases; (5) theoretical study on multiple classifier combination; (6) building a real-time audio-visual person recognition system.

Text

1119317.pdf - Version of Record

Available under License University of Southampton Thesis Licence.

Download (3MB)