Building tag hierarchies based on co-occurrences and lexico-syntactic patterns
Building tag hierarchies based on co-occurrences and lexico-syntactic patterns
Knowledge structures, such as taxonomies, are key to the organization and management of Web content, but are expensive to build manually. In this thesis we explore the issues around automatically building effective tag hierarchies from folksonomies (collective social classifications), and propose changes to the state-of-the-art methods that improve their performance. These changes aim to tackle the “generality-popularity” tags problem, in that popularity is assumed (sometimes inaccurately) to be a proxy for generality, i.e. high-level taxonomic terms will occur more often than low-level ones.
The effectiveness of this research is demonstrated in four experiments. The first experiment explores whether taxonomic tag pairs captured directly from users change the quality of constructed tag hierarchies. The second experiment examines the possibility of using personal tag relationships constructed by users to improve the accuracy of learned taxonomic tags. The third experiment demonstrates the potential of using lexico-syntactic patterns applied to a closed text corpus to improve the direction of automatically derived tag pairs in order to build higher quality tag hierarchies. The last experiment investigates the possibility of using an open knowledge repository instead of a closed knowledge resource to increase the tags coverage in any tag collection, and consequently the quality of learned tag hierarchies.
The results of our experiments show that collecting taxonomic tag pairs increases the semantic quality of the tag hierarchy, but at the expense of expressivity, and with some degradation of user experience. Secondly, personal tag relationships can be used to improve the accuracy of constructed taxonomic tags, but with limited success if the personal tag relationships and the learned taxonomic tags are not extracted from the same tagging system. Finally, lexico-syntactic patterns applied to a closed large text corpus (e.g. Wikipedia) can be used to improve the accuracy of directions in relations constructed between tags by a generality-based approach to tag hierarchy construction, and this would be improved further if an open corpus (e.g. the Web) is used instead of a closed one, which consequently improves the quality of the learned tag hierarchies in terms of structure and semantics.
University of Southampton
Bin Moqhim, Fahad Ibrahim
c1aba1d3-e4fe-4298-9193-54d15292e286
June 2016
Bin Moqhim, Fahad Ibrahim
c1aba1d3-e4fe-4298-9193-54d15292e286
Millard, David
4f19bca5-80dc-4533-a101-89a5a0e3b372
Bin Moqhim, Fahad Ibrahim
(2016)
Building tag hierarchies based on co-occurrences and lexico-syntactic patterns.
University of Southampton, Doctoral Thesis, 165pp.
Record type:
Thesis
(Doctoral)
Abstract
Knowledge structures, such as taxonomies, are key to the organization and management of Web content, but are expensive to build manually. In this thesis we explore the issues around automatically building effective tag hierarchies from folksonomies (collective social classifications), and propose changes to the state-of-the-art methods that improve their performance. These changes aim to tackle the “generality-popularity” tags problem, in that popularity is assumed (sometimes inaccurately) to be a proxy for generality, i.e. high-level taxonomic terms will occur more often than low-level ones.
The effectiveness of this research is demonstrated in four experiments. The first experiment explores whether taxonomic tag pairs captured directly from users change the quality of constructed tag hierarchies. The second experiment examines the possibility of using personal tag relationships constructed by users to improve the accuracy of learned taxonomic tags. The third experiment demonstrates the potential of using lexico-syntactic patterns applied to a closed text corpus to improve the direction of automatically derived tag pairs in order to build higher quality tag hierarchies. The last experiment investigates the possibility of using an open knowledge repository instead of a closed knowledge resource to increase the tags coverage in any tag collection, and consequently the quality of learned tag hierarchies.
The results of our experiments show that collecting taxonomic tag pairs increases the semantic quality of the tag hierarchy, but at the expense of expressivity, and with some degradation of user experience. Secondly, personal tag relationships can be used to improve the accuracy of constructed taxonomic tags, but with limited success if the personal tag relationships and the learned taxonomic tags are not extracted from the same tagging system. Finally, lexico-syntactic patterns applied to a closed large text corpus (e.g. Wikipedia) can be used to improve the accuracy of directions in relations constructed between tags by a generality-based approach to tag hierarchy construction, and this would be improved further if an open corpus (e.g. the Web) is used instead of a closed one, which consequently improves the quality of the learned tag hierarchies in terms of structure and semantics.
Text
Final thesis
- Version of Record
More information
Published date: June 2016
Identifiers
Local EPrints ID: 419475
URI: http://eprints.soton.ac.uk/id/eprint/419475
PURE UUID: 4a4a5fb5-87da-4401-a4cd-8ed9ae2d177d
Catalogue record
Date deposited: 12 Apr 2018 16:31
Last modified: 16 Mar 2024 06:21
Export record
Contributors
Author:
Fahad Ibrahim Bin Moqhim
Thesis advisor:
David Millard
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics