Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning
Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning
Open-domain visual entity recognition aims to identify and link entities depicted in images to a vast and evolving set of real-world concepts, such as those found in Wikidata. Unlike conventional classification tasks with fixed label sets, it operates under open-set conditions, where most target entities are unseen during training and exhibit long-tail distributions. This makes the task inherently challenging due to limited supervision, high visual ambiguity, and the need for semantic disambiguation. We propose a Knowledge-guided Contrastive Learning (KnowCoL) framework that combines both images and text descriptions into a shared semantic space grounded by structured information from Wikidata. By abstracting visual and textual inputs to a conceptual level, the model leverages entity descriptions, type hierarchies, and relational context to support zero-shot entity recognition. We evaluate our approach on the OVEN benchmark, a large-scale open-domain visual recognition dataset with Wikidata IDs as the label space. Our experiments show that using visual, textual, and structured knowledge greatly improves accuracy, especially for rare and unseen entities. Our smallest model improves the accuracy on unseen entities by 10.5% compared to the state-of-the-art, despite being 35× smaller.
13638-13646
Zhou, Hongkuan
4d7462bb-31e9-4b4b-8685-64e118034d8a
Halilaj, Lavdim
010e7fdd-8b96-466f-8712-7e18d3386bca
Monka, Sebastian
3794b4c9-3572-4090-810a-4ee2e38b4488
Schmid, Stefan
5ed75bbb-b268-4244-9ebd-9d289c9fbd73
Zhu, Yuqicheng
e2164ad8-3ba3-4dd5-9a6f-81f253f938c6
Wu, Jingcheng
77c3afb5-0d1b-475d-800c-7b1c2ac48a3f
Nazer, Nadeem
c2ef5c35-31cd-4549-9205-ebe48a1326d1
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
14 March 2026
Zhou, Hongkuan
4d7462bb-31e9-4b4b-8685-64e118034d8a
Halilaj, Lavdim
010e7fdd-8b96-466f-8712-7e18d3386bca
Monka, Sebastian
3794b4c9-3572-4090-810a-4ee2e38b4488
Schmid, Stefan
5ed75bbb-b268-4244-9ebd-9d289c9fbd73
Zhu, Yuqicheng
e2164ad8-3ba3-4dd5-9a6f-81f253f938c6
Wu, Jingcheng
77c3afb5-0d1b-475d-800c-7b1c2ac48a3f
Nazer, Nadeem
c2ef5c35-31cd-4549-9205-ebe48a1326d1
Staab, Steffen
bf48d51b-bd11-4d58-8e1c-4e6e03b30c49
Zhou, Hongkuan, Halilaj, Lavdim, Monka, Sebastian, Schmid, Stefan, Zhu, Yuqicheng, Wu, Jingcheng, Nazer, Nadeem and Staab, Steffen
(2026)
Seeing and Knowing in the Wild: Open-domain Visual Entity Recognition with Large-scale Knowledge Graphs via Contrastive Learning.
40th AAAI Conference on Artificial Intelligence, , Singapore, Singapore.
20 - 27 Jan 2026.
.
(doi:10.1609/aaai.v40i16.38370).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Open-domain visual entity recognition aims to identify and link entities depicted in images to a vast and evolving set of real-world concepts, such as those found in Wikidata. Unlike conventional classification tasks with fixed label sets, it operates under open-set conditions, where most target entities are unseen during training and exhibit long-tail distributions. This makes the task inherently challenging due to limited supervision, high visual ambiguity, and the need for semantic disambiguation. We propose a Knowledge-guided Contrastive Learning (KnowCoL) framework that combines both images and text descriptions into a shared semantic space grounded by structured information from Wikidata. By abstracting visual and textual inputs to a conceptual level, the model leverages entity descriptions, type hierarchies, and relational context to support zero-shot entity recognition. We evaluate our approach on the OVEN benchmark, a large-scale open-domain visual recognition dataset with Wikidata IDs as the label space. Our experiments show that using visual, textual, and structured knowledge greatly improves accuracy, especially for rare and unseen entities. Our smallest model improves the accuracy on unseen entities by 10.5% compared to the state-of-the-art, despite being 35× smaller.
This record has no associated files available for download.
More information
Accepted/In Press date: 1 December 2025
Published date: 14 March 2026
Venue - Dates:
40th AAAI Conference on Artificial Intelligence, , Singapore, Singapore, 2026-01-20 - 2026-01-27
Identifiers
Local EPrints ID: 510563
URI: http://eprints.soton.ac.uk/id/eprint/510563
PURE UUID: b06fdac9-0e2b-43e5-94f2-8b48fed0aa26
Catalogue record
Date deposited: 13 Apr 2026 16:57
Last modified: 29 May 2026 01:48
Export record
Altmetrics
Contributors
Author:
Hongkuan Zhou
Author:
Lavdim Halilaj
Author:
Sebastian Monka
Author:
Stefan Schmid
Author:
Yuqicheng Zhu
Author:
Jingcheng Wu
Author:
Nadeem Nazer
Author:
Steffen Staab
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics