ImagenWorld: stress-testing image generation models with explainable human evaluation on open-ended real-world tasks

Advances in diffusion, autoregressive, and hybrid models have enabled high-quality image synthesis for tasks such as text-to-image, editing, and reference-guided composition. Yet, existing benchmarks remain limited, either focus on isolated tasks, cover only narrow domains, or provide opaque scores without explaining failure modes. We introduce ImagenWorld, a benchmark of 3.6K condition sets spanning six core tasks (generation and editing, with single or multiple references) and six topical domains (artworks, photorealistic images, information graphics, textual graphics, computer graphics, and screenshots). The benchmark is supported by 20K fine-grained human annotations and an explainable evaluation schema that tags localized object-level and segment-level errors, complementing automated VLM-based metrics. Our large-scale evaluation of 14 models yields several insights: (1) models typically struggle more in editing tasks than in generation tasks, especially in local edits. (2) models excel in artistic and photorealistic settings but struggle with symbolic and text-heavy domains such as screenshots and information graphics. (3) closed-source systems lead overall, while targeted data curation (e.g., Qwen-Image) narrows the gap in text-heavy cases. (4) modern VLM-based metrics achieve Kendall accuracies up to 0.79, approximating human ranking, but fall short of fine-grained, explainable error attribution. ImagenWorld provides both a rigorous benchmark and a diagnostic tool to advance robust image generation.

Sani, Samin Mahdizadeh

dd456a44-8938-421d-b130-ac0c3a7216c8

Ku, Max

840a323c-459f-4dc3-91c1-e12e0dd2ba3f

Jamali, Nima

4dca3e14-ad03-4c26-9de6-b602d33c7f71

Sani, Matina Mahdizadeh

38acde6e-79d6-4086-9595-93e5f0d49e5f

Khoshtab, Paria

6c03f198-4306-4bfa-9298-56a0cd4d66ad

Sun, Wei-Chieh

6cafcb23-8a08-422b-bb0a-3d5f9b8ebd80

Fazel, Parnian

d417e054-ea2d-4baf-aa5f-f097ded69cd7

Tam, Zhi Rui

cfc09540-7709-4522-89c9-a84f093b5a3d

Chong, Thomas

26f72565-fb9c-4e0e-b90d-e7f9766f327c

Chan, Edisy Kin Wai

881c8106-2b61-41af-a542-dd802c72ace1

Tsang, Donald Wai Tong

7e35ab7f-9a82-47c3-a87f-5e46eefd8e5c

Hsu, Chiao-Wei

ce1c9e6a-8012-4bd5-a145-0e0615890c4c

Lam, Ting Wai

ec6cc652-f955-43cf-adc2-20719009debe

Ng, Ho Yin Sam

8f49d92c-4fe2-4f57-bf53-d3e8d5cc679d

Chu, Chiafeng

b639aac2-a105-48c7-8e9d-305aa0aaf904

Mak, Chak-Wing

0000d6f7-9a35-4a32-917b-e165dbfa0cd0

Wu, Keming

389fc8b9-ac2a-4bbf-84bb-69d8251d3fae

Wong, Hiu Tung

021f78c8-6f7b-4820-b0c7-e6a557a9efbd

Ho, Yik Chun

61479364-d010-4e9e-9e23-370a4974bee5

Ruan, Chi

f4db9518-e47e-46bb-ba79-dae5b265ac6e

Li, Zhuofeng

ecff04ff-abe8-4ec8-8f45-57af38d9768a

Fang, I-Sheng

231e2376-aee9-4dbb-b7dd-540a57e9cf83

Yeh, Shih-Ying

00c89cd0-2cf8-4d23-9a5f-44bf70897a59

Cheng, Ho Kei

36d4a306-9a37-4098-a33a-103d52f987f3

Nie, Ping

060816f8-706f-4f66-8a8a-c3e48bdf0d4d

Chen, Wenhu

b5952c1d-8856-45b5-9f18-06f8cce84ebd

et al.

28 February 2026

Sani, Samin Mahdizadeh

dd456a44-8938-421d-b130-ac0c3a7216c8

Ku, Max

840a323c-459f-4dc3-91c1-e12e0dd2ba3f

Jamali, Nima

4dca3e14-ad03-4c26-9de6-b602d33c7f71

Sani, Matina Mahdizadeh

38acde6e-79d6-4086-9595-93e5f0d49e5f

Khoshtab, Paria

6c03f198-4306-4bfa-9298-56a0cd4d66ad

Sun, Wei-Chieh

6cafcb23-8a08-422b-bb0a-3d5f9b8ebd80

Fazel, Parnian

d417e054-ea2d-4baf-aa5f-f097ded69cd7

Tam, Zhi Rui

cfc09540-7709-4522-89c9-a84f093b5a3d

Chong, Thomas

26f72565-fb9c-4e0e-b90d-e7f9766f327c

Chan, Edisy Kin Wai

881c8106-2b61-41af-a542-dd802c72ace1

Tsang, Donald Wai Tong

7e35ab7f-9a82-47c3-a87f-5e46eefd8e5c

Hsu, Chiao-Wei

ce1c9e6a-8012-4bd5-a145-0e0615890c4c

Lam, Ting Wai

ec6cc652-f955-43cf-adc2-20719009debe

Ng, Ho Yin Sam

8f49d92c-4fe2-4f57-bf53-d3e8d5cc679d

Chu, Chiafeng

b639aac2-a105-48c7-8e9d-305aa0aaf904

Mak, Chak-Wing

0000d6f7-9a35-4a32-917b-e165dbfa0cd0

Wu, Keming

389fc8b9-ac2a-4bbf-84bb-69d8251d3fae

Wong, Hiu Tung

021f78c8-6f7b-4820-b0c7-e6a557a9efbd

Ho, Yik Chun

61479364-d010-4e9e-9e23-370a4974bee5

Ruan, Chi

f4db9518-e47e-46bb-ba79-dae5b265ac6e

Li, Zhuofeng

ecff04ff-abe8-4ec8-8f45-57af38d9768a

Fang, I-Sheng

231e2376-aee9-4dbb-b7dd-540a57e9cf83

Yeh, Shih-Ying

00c89cd0-2cf8-4d23-9a5f-44bf70897a59

Cheng, Ho Kei

36d4a306-9a37-4098-a33a-103d52f987f3

Nie, Ping

060816f8-706f-4f66-8a8a-c3e48bdf0d4d

Chen, Wenhu

b5952c1d-8856-45b5-9f18-06f8cce84ebd

Sani, Samin Mahdizadeh, Ku, Max and Jamali, Nima , et al. (2026) ImagenWorld: stress-testing image generation models with explainable human evaluation on open-ended real-world tasks. In The Fourteenth International Conference on Learning Representations. 31 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

This record has no associated files available for download.

More information

Published date: 28 February 2026

Venue - Dates: ICLR 2026: The Fourteenth International Conference on Learning Representations, , Rio de Janeiro, Brazil, 2026-04-23 - 2026-04-27

Related URLs:

https://openreview.net/pdf?id=bld9g6jFh9

Learn more about the School of Electronics and Computer Science

Identifiers

Local EPrints ID: 510352

URI: http://eprints.soton.ac.uk/id/eprint/510352

PURE UUID: 20f2de4e-ba82-4032-aeae-f5bc08917e63

ORCID for Edisy Kin Wai Chan:

orcid.org/0009-0005-7598-5283

Catalogue record

Date deposited: 27 Mar 2026 17:31

Last modified: 28 Mar 2026 03:16

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Samin Mahdizadeh Sani

Author: Max Ku

Author: Nima Jamali

Author: Matina Mahdizadeh Sani

Author: Paria Khoshtab

Author: Wei-Chieh Sun

Author: Parnian Fazel

Author: Zhi Rui Tam

Author: Thomas Chong

Author: Edisy Kin Wai Chan

Author: Donald Wai Tong Tsang

Author: Chiao-Wei Hsu

Author: Ting Wai Lam

Author: Ho Yin Sam Ng

Author: Chiafeng Chu

Author: Chak-Wing Mak

Author: Keming Wu

Author: Hiu Tung Wong

Author: Yik Chun Ho

Author: Chi Ruan

Author: Zhuofeng Li

Author: I-Sheng Fang

Author: Shih-Ying Yeh

Author: Ho Kei Cheng

Author: Ping Nie

Author: Wenhu Chen

Corporate Author: et al.

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information