Large language models to make museum archive collections more accessible
Large language models to make museum archive collections more accessible
Keywords are essential to the searchability and therefore discoverability of museum and archival collections in the modern world. Without them, the collection management systems (CMS) and online collections these cultural organisations rely on to record, organise, and make their collections accessible, do not operate efficiently. However, generating these keywords manually is time consuming for these already resource strapped organisations. Artificial intelligence (AI), particularly generative AI and Large Language Models (LLMs), could hold the key to generating, even automating, this key data and as such be considered a co-creative add-on. This study contributes to the literature by introducing the use of Meta’s open-source LLM, Llama, to generate keywords from curator/archivist written descriptions of museum and archival collection items. Our findings suggest that these technologies add significant value compared to current manual methods for keyword generation. In particular, we find that through using carefully crafted prompts, successful keyword augmentations could be established making museum and archival collections much more accessible to wider and more diverse audiences. However, the results also showed that generative AI has biases (e.g., hallucinations, over generalisations, outdated language), though the frequency of occurrence was not as high as general perception may insist. Hence, we also discuss mitigation strategies to address these and how cultural institutions can recognise the risks and errors while getting the most from the systems. Finally, we discuss options to achieve structured results which allow easier ingestion of data back into CMS. Ultimately, LLMs hold significant potential to enhance accessibility to museum and archival collections, yet they are not without imperfection as we extensively discuss.
Generative AI, Keyword augmentation, Keyword generation, Large Language Models, Museum and archive collections
4485-4497
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Adams, A.
2dd7d783-8b5b-42c0-8f85-b6f6447d519f
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
27 February 2025
Reusens, Manon
3dc14c4b-793a-41d6-b7bd-64303cda1c42
Adams, A.
2dd7d783-8b5b-42c0-8f85-b6f6447d519f
Baesens, Bart
f7c6496b-aa7f-4026-8616-ca61d9e216f0
Reusens, Manon, Adams, A. and Baesens, Bart
(2025)
Large language models to make museum archive collections more accessible.
AI & Society Journal of Knowledge, Culture and Communication, 40 (6), .
(doi:10.1007/s00146-025-02227-8).
Abstract
Keywords are essential to the searchability and therefore discoverability of museum and archival collections in the modern world. Without them, the collection management systems (CMS) and online collections these cultural organisations rely on to record, organise, and make their collections accessible, do not operate efficiently. However, generating these keywords manually is time consuming for these already resource strapped organisations. Artificial intelligence (AI), particularly generative AI and Large Language Models (LLMs), could hold the key to generating, even automating, this key data and as such be considered a co-creative add-on. This study contributes to the literature by introducing the use of Meta’s open-source LLM, Llama, to generate keywords from curator/archivist written descriptions of museum and archival collection items. Our findings suggest that these technologies add significant value compared to current manual methods for keyword generation. In particular, we find that through using carefully crafted prompts, successful keyword augmentations could be established making museum and archival collections much more accessible to wider and more diverse audiences. However, the results also showed that generative AI has biases (e.g., hallucinations, over generalisations, outdated language), though the frequency of occurrence was not as high as general perception may insist. Hence, we also discuss mitigation strategies to address these and how cultural institutions can recognise the risks and errors while getting the most from the systems. Finally, we discuss options to achieve structured results which allow easier ingestion of data back into CMS. Ultimately, LLMs hold significant potential to enhance accessibility to museum and archival collections, yet they are not without imperfection as we extensively discuss.
Text
BB manuscript
- Accepted Manuscript
Restricted to Repository staff only until 23 January 2026.
Available under License Other.
Request a copy
More information
Accepted/In Press date: 23 January 2025
Published date: 27 February 2025
Keywords:
Generative AI, Keyword augmentation, Keyword generation, Large Language Models, Museum and archive collections
Identifiers
Local EPrints ID: 505051
URI: http://eprints.soton.ac.uk/id/eprint/505051
ISSN: 0951-5666
PURE UUID: acf16f9e-6d40-4da2-a30a-0e87eb1c8ce5
Catalogue record
Date deposited: 25 Sep 2025 16:35
Last modified: 25 Sep 2025 16:35
Export record
Altmetrics
Contributors
Author:
Manon Reusens
Author:
A. Adams
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics