The University of Southampton
University of Southampton Institutional Repository

Graph-based visual-semantic entanglement network for zero-shot image recognition

Graph-based visual-semantic entanglement network for zero-shot image recognition
Graph-based visual-semantic entanglement network for zero-shot image recognition
Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN feedback the graph modeled information to CNN features; 2. it uses attribute word vectors as the target for the graph semantic modeling of GCN, which forms a self-consistent regression for graph modeling and supervise GCN to learn more personalized attribute relations; 3. it fuses and supplements the hierarchical visual-semantic features refined by graph modeling into visual embedding. Our method outperforms state-of-the-art approaches on multiple representative ZSL datasets: AwA2, CUB, and SUN by promoting the semantic linkage modelling of visual features.
1520-9210
Hu, Yang
3a9d668f-8b65-4a93-b15f-1363e07d44fa
Wen, Guihua
411fd94f-89bd-4ad7-908d-9c876afd7564
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Pei, Yang
933fc229-1c3f-4225-8646-d47ce0c684f3
Luo, Mingnan
43faccbb-eead-4787-af0f-d3fbe7f2538b
Xu, Yingxue
d79d4331-b39f-4b6d-9dd5-574926fe7fa4
Dai, Dan
85b7cbb9-cd58-46e1-b7ff-c264e9f46908
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c
Hu, Yang
3a9d668f-8b65-4a93-b15f-1363e07d44fa
Wen, Guihua
411fd94f-89bd-4ad7-908d-9c876afd7564
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Pei, Yang
933fc229-1c3f-4225-8646-d47ce0c684f3
Luo, Mingnan
43faccbb-eead-4787-af0f-d3fbe7f2538b
Xu, Yingxue
d79d4331-b39f-4b6d-9dd5-574926fe7fa4
Dai, Dan
85b7cbb9-cd58-46e1-b7ff-c264e9f46908
Hall, Wendy
11f7f8db-854c-4481-b1ae-721a51d8790c

Hu, Yang, Wen, Guihua, Chapman, Adriane, Pei, Yang, Luo, Mingnan, Xu, Yingxue, Dai, Dan and Hall, Wendy (2021) Graph-based visual-semantic entanglement network for zero-shot image recognition. IEEE Transactions on Multimedia. (In Press)

Record type: Article

Abstract

Zero-shot learning uses semantic attributes to connect the search space of unseen objects. In recent years, although the deep convolutional network brings powerful visual modeling capabilities to the ZSL task, its visual features have severe pattern inertia and lack of representation of semantic relationships, which leads to severe bias and ambiguity. In response to this, we propose the Graph-based Visual-Semantic Entanglement Network to conduct graph modeling of visual features, which is mapped to semantic attributes by using a knowledge graph, it contains several novel designs: 1. it establishes a multi-path entangled network with the convolutional neural network (CNN) and the graph convolutional network (GCN), which input the visual features from CNN to GCN to model the implicit semantic relations, then GCN feedback the graph modeled information to CNN features; 2. it uses attribute word vectors as the target for the graph semantic modeling of GCN, which forms a self-consistent regression for graph modeling and supervise GCN to learn more personalized attribute relations; 3. it fuses and supplements the hierarchical visual-semantic features refined by graph modeling into visual embedding. Our method outperforms state-of-the-art approaches on multiple representative ZSL datasets: AwA2, CUB, and SUN by promoting the semantic linkage modelling of visual features.

Text
Graph-based Visual-Semantic Entanglement Network for Zero-shot Image Recognition - Accepted Manuscript
Download (1MB)

More information

Accepted/In Press date: 12 June 2021

Identifiers

Local EPrints ID: 450317
URI: http://eprints.soton.ac.uk/id/eprint/450317
ISSN: 1520-9210
PURE UUID: a4881702-63c3-4769-adf8-e190df47912d
ORCID for Adriane Chapman: ORCID iD orcid.org/0000-0002-3814-2587
ORCID for Wendy Hall: ORCID iD orcid.org/0000-0003-4327-7811

Catalogue record

Date deposited: 22 Jul 2021 16:31
Last modified: 17 Mar 2024 03:46

Export record

Contributors

Author: Yang Hu
Author: Guihua Wen
Author: Adriane Chapman ORCID iD
Author: Yang Pei
Author: Mingnan Luo
Author: Yingxue Xu
Author: Dan Dai
Author: Wendy Hall ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×