Exploiting Linked Open Data (LoD) and Crowdsourcing-based semantic annotation & tagging in web repositories to improve and sustain relevance in search results
Exploiting Linked Open Data (LoD) and Crowdsourcing-based semantic annotation & tagging in web repositories to improve and sustain relevance in search results
Online searching of multi-disciplinary web repositories is a topic of increasing importance
as the number of repositories increases and the diversity of skills and backgrounds of
their users widens. Earlier term-frequency based approaches have been improved by
ontology-based semantic annotation, but such approaches are predominantly driven by "domain ontologies engineering first" and lack dynamicity, whereas the information is dynamic; the meaning of things changes with time; and new concepts are constantly being introduced. Further, there is no sustainable framework or method, discovered so far, which could automatically enrich the content of heterogeneous online resources for information retrieval over time. Furthermore, the methods and techniques being applied are fast becoming inadequate due to increasing data volume, concept obsolescence, and complexity and heterogeneity of content types in web repositories. In the face of such complexities, term matching alone between a query and the indexed documents will no longer fulfil complex user needs. The ever growing gap between syntax and semantics needs to be continually bridged in order to address the above issues; and ensure accurate search results retrieval, against natural language queries, despite such challenges. This thesis investigates that by domain-specific expert crowd-annotation of content, on top of the automatic semantic annotation (using Linked Open Data sources), the contemporary value of content in scientific repositories, can be continually enriched and sustained. A purpose-built annotation, indexing and searching environment has been developed and deployed to a web repository, which hosts more than 3,400 heterogeneous web documents. Based on expert crowd annotations, automatic LoD-based named entity extraction and search results evaluations, this research finds that search results retrieval, having the crowd-sourced element, performs better than those having no crowd-sourced element. This thesis also shows that a consensus can be reached between the expert and non-expert crowd-sourced annotators on annotating and tagging the content of web repositories, using the controlled vocabulary (typology) and free-text terms and keywords.
University of Southampton
Khan, Arshad Ali
bba4b9b5-eb02-4732-81a5-902a87df8972
November 2018
Khan, Arshad Ali
bba4b9b5-eb02-4732-81a5-902a87df8972
Tiropanis, Thanassis
d06654bd-5513-407b-9acd-6f9b9c5009d8
Khan, Arshad Ali
(2018)
Exploiting Linked Open Data (LoD) and Crowdsourcing-based semantic annotation & tagging in web repositories to improve and sustain relevance in search results.
University of Southampton, Doctoral Thesis, 301pp.
Record type:
Thesis
(Doctoral)
Abstract
Online searching of multi-disciplinary web repositories is a topic of increasing importance
as the number of repositories increases and the diversity of skills and backgrounds of
their users widens. Earlier term-frequency based approaches have been improved by
ontology-based semantic annotation, but such approaches are predominantly driven by "domain ontologies engineering first" and lack dynamicity, whereas the information is dynamic; the meaning of things changes with time; and new concepts are constantly being introduced. Further, there is no sustainable framework or method, discovered so far, which could automatically enrich the content of heterogeneous online resources for information retrieval over time. Furthermore, the methods and techniques being applied are fast becoming inadequate due to increasing data volume, concept obsolescence, and complexity and heterogeneity of content types in web repositories. In the face of such complexities, term matching alone between a query and the indexed documents will no longer fulfil complex user needs. The ever growing gap between syntax and semantics needs to be continually bridged in order to address the above issues; and ensure accurate search results retrieval, against natural language queries, despite such challenges. This thesis investigates that by domain-specific expert crowd-annotation of content, on top of the automatic semantic annotation (using Linked Open Data sources), the contemporary value of content in scientific repositories, can be continually enriched and sustained. A purpose-built annotation, indexing and searching environment has been developed and deployed to a web repository, which hosts more than 3,400 heterogeneous web documents. Based on expert crowd annotations, automatic LoD-based named entity extraction and search results evaluations, this research finds that search results retrieval, having the crowd-sourced element, performs better than those having no crowd-sourced element. This thesis also shows that a consensus can be reached between the expert and non-expert crowd-sourced annotators on annotating and tagging the content of web repositories, using the controlled vocabulary (typology) and free-text terms and keywords.
Text
Final Thesis
- Version of Record
More information
Published date: November 2018
Identifiers
Local EPrints ID: 428046
URI: http://eprints.soton.ac.uk/id/eprint/428046
PURE UUID: d9158acd-ad63-43c6-9478-11bd043a0233
Catalogue record
Date deposited: 07 Feb 2019 17:30
Last modified: 16 Mar 2024 07:33
Export record
Contributors
Author:
Arshad Ali Khan
Thesis advisor:
Thanassis Tiropanis
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics