Rethinking information retrieval in a re-decentralised web: exploring the feasibility and quality of search across personal online datastores
Rethinking information retrieval in a re-decentralised web: exploring the feasibility and quality of search across personal online datastores
Traditional information retrieval (IR) models, such as keyword-based and vector-based techniques, have long been used in centralized systems. However, the Web’s re-decentralization, with its focus on data ownership and privacy, calls for a re-evaluation of these methods in these settings. While standards for decentralized search enhance privacy to some extent, they also introduce computational overhead, black-box decision-making, and infrastructure complexity. Despite these challenges, traditional IR techniques remain largely unexplored in such environments. This paper presents an innovative application of traditional IR models in the decentralized Web by adapting them for Personal Online Data Stores (PODs), where search parties have varying access rights. We explore their role in source selection, document ranking, and result merging, extending them to meet decentralized search demands. Using Solid PODs and a synthetic medical dataset, we evaluate these models in a privacy-sensitive environment. Our findings demonstrate that extended IR methods provide an effective balance of performance, interpretability, and efficiency. These approaches hold strong potential as privacy-preserving alternatives for decentralized search on a re-decentralized Web. Notably, our top-performing model achieved competitive results in top-item retrieval compared to centralized search systems, maintaining high relevance scores under both limited and full data access conditions.
Bahrani, Mohammad
e3191e43-22e3-4191-9ec0-a072a3d06c22
Ragab, Mohamed
70b66274-31dc-474c-82a1-f838ad062a14
Oliver, Helen
a8c3c44b-4cd8-40e9-9e65-280f8669e56f
Tiropanis, Thanassis
d06654bd-5513-407b-9acd-6f9b9c5009d8
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Poulovassilis, Alexandra
3b1668fd-3d66-4ea4-aacd-ea75a78fc064
Roussos, George
9d4d00f1-ac33-4b3a-89f1-30c61cc37f3d
Bahrani, Mohammad
e3191e43-22e3-4191-9ec0-a072a3d06c22
Ragab, Mohamed
70b66274-31dc-474c-82a1-f838ad062a14
Oliver, Helen
a8c3c44b-4cd8-40e9-9e65-280f8669e56f
Tiropanis, Thanassis
d06654bd-5513-407b-9acd-6f9b9c5009d8
Chapman, Adriane
721b7321-8904-4be2-9b01-876c430743f1
Poulovassilis, Alexandra
3b1668fd-3d66-4ea4-aacd-ea75a78fc064
Roussos, George
9d4d00f1-ac33-4b3a-89f1-30c61cc37f3d
Bahrani, Mohammad, Ragab, Mohamed, Oliver, Helen, Tiropanis, Thanassis, Chapman, Adriane, Poulovassilis, Alexandra and Roussos, George
(2025)
Rethinking information retrieval in a re-decentralised web: exploring the feasibility and quality of search across personal online datastores.
ACM Transactions on the Web.
(doi:10.1145/3777445).
Abstract
Traditional information retrieval (IR) models, such as keyword-based and vector-based techniques, have long been used in centralized systems. However, the Web’s re-decentralization, with its focus on data ownership and privacy, calls for a re-evaluation of these methods in these settings. While standards for decentralized search enhance privacy to some extent, they also introduce computational overhead, black-box decision-making, and infrastructure complexity. Despite these challenges, traditional IR techniques remain largely unexplored in such environments. This paper presents an innovative application of traditional IR models in the decentralized Web by adapting them for Personal Online Data Stores (PODs), where search parties have varying access rights. We explore their role in source selection, document ranking, and result merging, extending them to meet decentralized search demands. Using Solid PODs and a synthetic medical dataset, we evaluate these models in a privacy-sensitive environment. Our findings demonstrate that extended IR methods provide an effective balance of performance, interpretability, and efficiency. These approaches hold strong potential as privacy-preserving alternatives for decentralized search on a re-decentralized Web. Notably, our top-performing model achieved competitive results in top-item retrieval compared to centralized search systems, maintaining high relevance scores under both limited and full data access conditions.
Text
TWEB_2025_Final
- Accepted Manuscript
Text
3777445
- Accepted Manuscript
More information
Accepted/In Press date: 31 October 2025
e-pub ahead of print date: 20 November 2025
Identifiers
Local EPrints ID: 507324
URI: http://eprints.soton.ac.uk/id/eprint/507324
ISSN: 1559-114X
PURE UUID: db388871-0d1b-474d-bb36-277496df596e
Catalogue record
Date deposited: 04 Dec 2025 17:49
Last modified: 05 Dec 2025 03:07
Export record
Altmetrics
Contributors
Author:
Mohammad Bahrani
Author:
Mohamed Ragab
Author:
Helen Oliver
Author:
Thanassis Tiropanis
Author:
Alexandra Poulovassilis
Author:
George Roussos
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics