The University of Southampton
University of Southampton Institutional Repository

Crowd-annotation and LoD-based semantic indexing of content in multi-disciplinary web repositories to improve search results

Crowd-annotation and LoD-based semantic indexing of content in multi-disciplinary web repositories to improve search results
Crowd-annotation and LoD-based semantic indexing of content in multi-disciplinary web repositories to improve search results
Searching for relevant information in multi-disciplinary web repositories is becoming a topic of increasing interest among the computer science research community. To date, methods and techniques to extract useful and relevant information from online repositories of research data have largely been based on static full text indexing which entails a 'produce once and use forever' kind of strategy. That strategy is fast becoming insufficient due to increasing data volume, concept obsolescence, and complexity and heterogeneity of content types in web repositories. We propose that by automatic semantic annotation of content in web repositories (using Linked Open Data or LoD sources) without using domain-specific ontologies, we can sustain the performance of searching by retrieving highly relevant search results. Secondly, we claim that by expert crowd-annotation of content on top of automatic semantic annotation, we can enrich the semantic index over time to augment the contextual value of content in web repositories so that they remain findable despite changes in language, terminology and scientific concepts. We deployed a custom-built annotation, indexing and searching environment in a web repository website that has been used by expert annotators to annotate webpages using free text and vocabulary terms. We present our findings based on the annotation and tagging data on top of LoD-based annotations and the overall modus operandi. We also analyze and demonstrate that by adding expert annotations to the existing semantic index, we can improve the relationship between query and documents using Cosine Similarity Measures (CSM).
Association for Computing Machinery
Khan, Arshad
bba4b9b5-eb02-4732-81a5-902a87df8972
Tiropanis, Thanassis
d06654bd-5513-407b-9acd-6f9b9c5009d8
Martin, David
e5c52473-e9f0-4f09-b64c-fa32194b162f
Khan, Arshad
bba4b9b5-eb02-4732-81a5-902a87df8972
Tiropanis, Thanassis
d06654bd-5513-407b-9acd-6f9b9c5009d8
Martin, David
e5c52473-e9f0-4f09-b64c-fa32194b162f

Khan, Arshad, Tiropanis, Thanassis and Martin, David (2017) Crowd-annotation and LoD-based semantic indexing of content in multi-disciplinary web repositories to improve search results. In ACSW '17 Proceedings of the Australasian Computer Science Week Multiconference. Association for Computing Machinery. 12 pp . (doi:10.1145/3014812.3014867).

Record type: Conference or Workshop Item (Paper)

Abstract

Searching for relevant information in multi-disciplinary web repositories is becoming a topic of increasing interest among the computer science research community. To date, methods and techniques to extract useful and relevant information from online repositories of research data have largely been based on static full text indexing which entails a 'produce once and use forever' kind of strategy. That strategy is fast becoming insufficient due to increasing data volume, concept obsolescence, and complexity and heterogeneity of content types in web repositories. We propose that by automatic semantic annotation of content in web repositories (using Linked Open Data or LoD sources) without using domain-specific ontologies, we can sustain the performance of searching by retrieving highly relevant search results. Secondly, we claim that by expert crowd-annotation of content on top of automatic semantic annotation, we can enrich the semantic index over time to augment the contextual value of content in web repositories so that they remain findable despite changes in language, terminology and scientific concepts. We deployed a custom-built annotation, indexing and searching environment in a web repository website that has been used by expert annotators to annotate webpages using free text and vocabulary terms. We present our findings based on the annotation and tagging data on top of LoD-based annotations and the overall modus operandi. We also analyze and demonstrate that by adding expert annotations to the existing semantic index, we can improve the relationship between query and documents using Cosine Similarity Measures (CSM).

Text
a53-khan.pdf - Version of Record
Restricted to Repository staff only
Available under License Other.
Request a copy

More information

Published date: 30 January 2017
Venue - Dates: Australasian Computer Science Week Multiconference, Deakin University Waterfront Campus, Geelong West, Australia, 2017-01-30 - 2017-02-03
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 405809
URI: http://eprints.soton.ac.uk/id/eprint/405809
PURE UUID: c95c00c0-9520-4fc0-95ed-b95486a3ebd7
ORCID for Thanassis Tiropanis: ORCID iD orcid.org/0000-0002-6195-2852
ORCID for David Martin: ORCID iD orcid.org/0000-0003-0397-0769

Catalogue record

Date deposited: 18 Feb 2017 00:20
Last modified: 16 Mar 2024 03:58

Export record

Altmetrics

Contributors

Author: Arshad Khan
Author: Thanassis Tiropanis ORCID iD
Author: David Martin ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×