Automatic geo-alignment of artwork in children's story books
Automatic geo-alignment of artwork in children's story books
A study was conducted to prove AI software could be used to translate and generate illustrations without any human intervention. This was done with the purpose of showing and distributing it to the external customer, Pratham Books. The project aligns with the company's vision by leveraging the generalisation and scalability of Machine Learning algorithms, offering significant cost efficiency increases to a wide range of literary audiences in varied geographical locations. A comparative study methodology was utilised to determine the best performant method out of the 3 devised, Prompt Augmentation using Keywords, CLIP Embedding Mask, and Cross Attention Control with Editorial Prompts. A thorough evaluation process was completed using both quantitative and qualitative measures. Each method had its own strengths and weaknesses, but through the evaluation, method 1 was found to have the best yielding results. Promising future advancements may be made to further increase image quality by incorporating Large Language Models and personalised stylistic models. The presented approach can also be adapted to Video and 3D sculpture generation for novel illustrations in digital webbooks.
Dylag, Jakub J.
419a56cd-af18-401e-bd4a-070a4d76270b
Suarez, Victor
1f28e1b9-d491-4ed9-b503-9b8a6d3296de
Wald, James
0b8c0550-6764-4cf0-86b1-f44501c55d01
Uvaraj, Aneesha Amodini
1596d50c-f091-44ee-8b01-2baa95dee978
16 March 2023
Dylag, Jakub J.
419a56cd-af18-401e-bd4a-070a4d76270b
Suarez, Victor
1f28e1b9-d491-4ed9-b503-9b8a6d3296de
Wald, James
0b8c0550-6764-4cf0-86b1-f44501c55d01
Uvaraj, Aneesha Amodini
1596d50c-f091-44ee-8b01-2baa95dee978
[Unknown type: UNSPECIFIED]
Abstract
A study was conducted to prove AI software could be used to translate and generate illustrations without any human intervention. This was done with the purpose of showing and distributing it to the external customer, Pratham Books. The project aligns with the company's vision by leveraging the generalisation and scalability of Machine Learning algorithms, offering significant cost efficiency increases to a wide range of literary audiences in varied geographical locations. A comparative study methodology was utilised to determine the best performant method out of the 3 devised, Prompt Augmentation using Keywords, CLIP Embedding Mask, and Cross Attention Control with Editorial Prompts. A thorough evaluation process was completed using both quantitative and qualitative measures. Each method had its own strengths and weaknesses, but through the evaluation, method 1 was found to have the best yielding results. Promising future advancements may be made to further increase image quality by incorporating Large Language Models and personalised stylistic models. The presented approach can also be adapted to Video and 3D sculpture generation for novel illustrations in digital webbooks.
Text
2304.01204
- Author's Original
More information
Published date: 16 March 2023
Identifiers
Local EPrints ID: 489014
URI: http://eprints.soton.ac.uk/id/eprint/489014
PURE UUID: e2419a1a-7075-4bef-a245-189ed1f43d11
Catalogue record
Date deposited: 11 Apr 2024 16:31
Last modified: 16 Nov 2024 03:08
Export record
Altmetrics
Contributors
Author:
Jakub J. Dylag
Author:
Victor Suarez
Author:
James Wald
Author:
Aneesha Amodini Uvaraj
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics