TriGait: hybrid fusion strategy for multimodal alignment and integration in gait recognition
TriGait: hybrid fusion strategy for multimodal alignment and integration in gait recognition
Due to the inherent limitations of single modalities, multimodal fusion has become increasingly popular in many computer vision fields, leveraging the complementary advantages of unimodal methods. As an emerging biometric technology with great application potential, gait recognition faces similar challenges. The prevailing silhouette-based and skeleton-based gait recognition methods have their respective limitations: one focuses on appearance information while neglecting structural details, and the other does the opposite. Multimodal gait recognition, which combines silhouette and skeleton, promises more robust predictions. However, it is essential and difficult to explore the implicit interaction between dense pixels and discrete coordinate points. Most existing multimodal gait recognition methods basically concatenated features from silhouette and skeleton and did not fully exploit complementarity between them. This paper presents a hybrid fusion strategy called TriGait, which is a three branch structural model and thoroughly explores the interaction and complementarity of the two modalities. To solve the problem of data heterogeneity and explore the mutual information of two modalities, we propose the use of a cross-modal token generator (CMTG) within a fusion branch to align and fuse the low-level features of the two modalities. Additionally, TriGait has two extra branches for extracting high-level semantic information from silhouette and skeleton. By combining low-level correlation information and high-level semantic information, TriGait provides a comprehensive and discriminative representation of a subject’s gait. Extensive experimental results on CASIA-B, Gait3D and OUMVLP demonstrate the effectiveness of TriGait. Remarkably, TriGait achieves the rank-1 mean accuracy of 96.6%, 61.4% and 91.1% on CASIA-B, Gait3D and OUMVLP respectively, outperforming the state-of-the-art methods. The source code will be available at: https://github.com/YanSun-github/TriGait/.
Sun, Yan
0e11df25-8ee0-4a0f-af6a-ae630267487a
Feng, Xueling
640a3b36-0b13-4fd3-bcd8-fd9122a69077
Liu, Xiaolei
3a0deecb-e3dc-4edc-a65a-5bfa6ccb7c80
Ma, Liyan
d1529371-bcdf-4c01-a237-f2a0282814ae
Hu, Long
21b0a54b-72cc-4853-91e2-59e172178494
Nixon, Mark
2b5b9804-5a81-462a-82e6-92ee5fa74e12
Sun, Yan
0e11df25-8ee0-4a0f-af6a-ae630267487a
Feng, Xueling
640a3b36-0b13-4fd3-bcd8-fd9122a69077
Liu, Xiaolei
3a0deecb-e3dc-4edc-a65a-5bfa6ccb7c80
Ma, Liyan
d1529371-bcdf-4c01-a237-f2a0282814ae
Hu, Long
21b0a54b-72cc-4853-91e2-59e172178494
Nixon, Mark
2b5b9804-5a81-462a-82e6-92ee5fa74e12
Sun, Yan, Feng, Xueling, Liu, Xiaolei, Ma, Liyan, Hu, Long and Nixon, Mark
(2024)
TriGait: hybrid fusion strategy for multimodal alignment and integration in gait recognition.
IEEE Transactions on Biometrics, Behavior, and Identity Science, [TBIOM-2024-02-0015].
(doi:10.1109/TBIOM.2024.3435046).
Abstract
Due to the inherent limitations of single modalities, multimodal fusion has become increasingly popular in many computer vision fields, leveraging the complementary advantages of unimodal methods. As an emerging biometric technology with great application potential, gait recognition faces similar challenges. The prevailing silhouette-based and skeleton-based gait recognition methods have their respective limitations: one focuses on appearance information while neglecting structural details, and the other does the opposite. Multimodal gait recognition, which combines silhouette and skeleton, promises more robust predictions. However, it is essential and difficult to explore the implicit interaction between dense pixels and discrete coordinate points. Most existing multimodal gait recognition methods basically concatenated features from silhouette and skeleton and did not fully exploit complementarity between them. This paper presents a hybrid fusion strategy called TriGait, which is a three branch structural model and thoroughly explores the interaction and complementarity of the two modalities. To solve the problem of data heterogeneity and explore the mutual information of two modalities, we propose the use of a cross-modal token generator (CMTG) within a fusion branch to align and fuse the low-level features of the two modalities. Additionally, TriGait has two extra branches for extracting high-level semantic information from silhouette and skeleton. By combining low-level correlation information and high-level semantic information, TriGait provides a comprehensive and discriminative representation of a subject’s gait. Extensive experimental results on CASIA-B, Gait3D and OUMVLP demonstrate the effectiveness of TriGait. Remarkably, TriGait achieves the rank-1 mean accuracy of 96.6%, 61.4% and 91.1% on CASIA-B, Gait3D and OUMVLP respectively, outperforming the state-of-the-art methods. The source code will be available at: https://github.com/YanSun-github/TriGait/.
Text
Final_TriGait__Hybrid_Fusion_Strategy_for_Multimodal_Alignment_and_Integration_in_Gait_Recognition
- Accepted Manuscript
More information
Accepted/In Press date: 20 July 2024
e-pub ahead of print date: 29 July 2024
Identifiers
Local EPrints ID: 492858
URI: http://eprints.soton.ac.uk/id/eprint/492858
ISSN: 2637-6407
PURE UUID: 0fca8104-568b-4e5a-87f9-ee5ce7a0b15f
Catalogue record
Date deposited: 16 Aug 2024 16:37
Last modified: 17 Aug 2024 01:32
Export record
Altmetrics
Contributors
Author:
Yan Sun
Author:
Xueling Feng
Author:
Xiaolei Liu
Author:
Liyan Ma
Author:
Long Hu
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics