Exploiting Linked Open Data (LoD) and Crowdsourcing-based semantic annotation & tagging in web repositories to improve and sustain relevance in search results

Online searching of multi-disciplinary web repositories is a topic of increasing importance
as the number of repositories increases and the diversity of skills and backgrounds of
their users widens. Earlier term-frequency based approaches have been improved by
ontology-based semantic annotation, but such approaches are predominantly driven by "domain ontologies engineering first" and lack dynamicity, whereas the information is dynamic; the meaning of things changes with time; and new concepts are constantly being introduced. Further, there is no sustainable framework or method, discovered so far, which could automatically enrich the content of heterogeneous online resources for information retrieval over time. Furthermore, the methods and techniques being applied are fast becoming inadequate due to increasing data volume, concept obsolescence, and complexity and heterogeneity of content types in web repositories. In the face of such complexities, term matching alone between a query and the indexed documents will no longer fulfil complex user needs. The ever growing gap between syntax and semantics needs to be continually bridged in order to address the above issues; and ensure accurate search results retrieval, against natural language queries, despite such challenges. This thesis investigates that by domain-specific expert crowd-annotation of content, on top of the automatic semantic annotation (using Linked Open Data sources), the contemporary value of content in scientific repositories, can be continually enriched and sustained. A purpose-built annotation, indexing and searching environment has been developed and deployed to a web repository, which hosts more than 3,400 heterogeneous web documents. Based on expert crowd annotations, automatic LoD-based named entity extraction and search results evaluations, this research finds that search results retrieval, having the crowd-sourced element, performs better than those having no crowd-sourced element. This thesis also shows that a consensus can be reached between the expert and non-expert crowd-sourced annotators on annotating and tagging the content of web repositories, using the controlled vocabulary (typology) and free-text terms and keywords.

University of Southampton

Khan, Arshad Ali

bba4b9b5-eb02-4732-81a5-902a87df8972

November 2018

Khan, Arshad Ali

bba4b9b5-eb02-4732-81a5-902a87df8972

Tiropanis, Thanassis

d06654bd-5513-407b-9acd-6f9b9c5009d8

Khan, Arshad Ali (2018) Exploiting Linked Open Data (LoD) and Crowdsourcing-based semantic annotation & tagging in web repositories to improve and sustain relevance in search results. University of Southampton, Doctoral Thesis, 301pp.

Record type: Thesis (Doctoral)