The University of Southampton
University of Southampton Institutional Repository

Classification of linked data sources using semantic scoring

Classification of linked data sources using semantic scoring
Classification of linked data sources using semantic scoring
Linked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs:comment and rdfs:label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.
0916-8532
99-107
Yumusak, S.
5a45f53d-7a3c-4e3d-93b1-bc83f7096f37
Dogdu, E.
6d452e34-d1e4-4396-990c-9eb3e8a6882f
Kodaz, H.
23792a05-de24-4c58-bf0e-132af51332cc
Yumusak, S.
5a45f53d-7a3c-4e3d-93b1-bc83f7096f37
Dogdu, E.
6d452e34-d1e4-4396-990c-9eb3e8a6882f
Kodaz, H.
23792a05-de24-4c58-bf0e-132af51332cc

Yumusak, S., Dogdu, E. and Kodaz, H. (2018) Classification of linked data sources using semantic scoring. IEICE Transactions on Information and Systems: Special Issue on Human Communications, E101.D (1), 99-107. (doi:10.1587/transinf.2017SWP0011).

Record type: Article

Abstract

Linked data sets are created using semantic Web technologies and they are usually big and the number of such datasets is growing. The query execution is therefore costly, and knowing the content of data in such datasets should help in targeted querying. Our aim in this paper is to classify linked data sets by their knowledge content. Earlier projects such as LOD Cloud, LODStats, and SPARQLES analyze linked data sources in terms of content, availability and infrastructure. In these projects, linked data sets are classified and tagged principally using VoID vocabulary and analyzed according to their content, availability and infrastructure. Although all linked data sources listed in these projects appear to be classified or tagged, there are a limited number of studies on automated tagging and classification of newly arriving linked data sets. Here, we focus on automated classification of linked data sets using semantic scoring methods. We have collected the SPARQL endpoints of 1,328 unique linked datasets from Datahub, LOD Cloud, LODStats, SPARQLES, and SpEnD projects. We have then queried textual descriptions of resources in these data sets using their rdfs:comment and rdfs:label property values. We analyzed these texts in a similar manner with document analysis techniques by assuming every SPARQL endpoint as a separate document. In this regard, we have used WordNet semantic relations library combined with an adapted term frequency-inverted document frequency (tfidf) analysis on the words and their semantic neighbours. In WordNet database, we have extracted information about comment/label objects in linked data sources by using hypernym, hyponym, homonym, meronym, region, topic and usage semantic relations. We obtained some significant results on hypernym and topic semantic relations; we can find words that identify data sets and this can be used in automatic classification and tagging of linked data sources. By using these words, we experimented different classifiers with different scoring methods, which results in better classification accuracy results.

Text
E101.D_2017SWP0011 - Version of Record
Available under License Other.
Download (1MB)

More information

e-pub ahead of print date: 1 January 2018

Identifiers

Local EPrints ID: 478826
URI: http://eprints.soton.ac.uk/id/eprint/478826
ISSN: 0916-8532
PURE UUID: fad796c4-565b-4c09-bbfd-2aace3b5f854

Catalogue record

Date deposited: 11 Jul 2023 16:57
Last modified: 17 Mar 2024 02:35

Export record

Altmetrics

Contributors

Author: S. Yumusak
Author: E. Dogdu
Author: H. Kodaz

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×