The University of Southampton
University of Southampton Institutional Repository

SpEnD: Linked data SPARQL endpoints discovery using search engines

SpEnD: Linked data SPARQL endpoints discovery using search engines
SpEnD: Linked data SPARQL endpoints discovery using search engines
Linked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. There are linked data respsitories that collect and list the available linked data endpoints or resources. It is observed that around half of the endpoints listed in existing repositories are not accessible (temporarily or permanently offline). These endpoint URLs are shared through repository websites, such as Datahub.io, however, they are weakly maintained and revised only by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a “search keyword” discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, the collected search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. We analyze our findings in comparison to Datahub collection in detail.
0916-8532
758-767
Yumusak, S.
5a45f53d-7a3c-4e3d-93b1-bc83f7096f37
Dogdu, E.
6d452e34-d1e4-4396-990c-9eb3e8a6882f
Kodaz, H.
23792a05-de24-4c58-bf0e-132af51332cc
Kamilaris, A.
f9484944-b2c2-4ad7-9819-8705ceeb3ee5
Vandenbussche, P.-Y.
020729d6-15d2-4a1e-863d-da36a23d6f08
Yumusak, S.
5a45f53d-7a3c-4e3d-93b1-bc83f7096f37
Dogdu, E.
6d452e34-d1e4-4396-990c-9eb3e8a6882f
Kodaz, H.
23792a05-de24-4c58-bf0e-132af51332cc
Kamilaris, A.
f9484944-b2c2-4ad7-9819-8705ceeb3ee5
Vandenbussche, P.-Y.
020729d6-15d2-4a1e-863d-da36a23d6f08

Yumusak, S., Dogdu, E., Kodaz, H., Kamilaris, A. and Vandenbussche, P.-Y. (2017) SpEnD: Linked data SPARQL endpoints discovery using search engines. IEICE Transactions on Information and Systems: Special Issue on Human Communications, E100.D (4), 758-767. (doi:10.1587/transinf.2016DAP0025).

Record type: Article

Abstract

Linked data endpoints are online query gateways to semantically annotated linked data sources. In order to query these data sources, SPARQL query language is used as a standard. Although a linked data endpoint (i.e. SPARQL endpoint) is a basic Web service, it provides a platform for federated online querying and data linking methods. For linked data consumers, SPARQL endpoint availability and discovery are crucial for live querying and semantic information retrieval. Current studies show that availability of linked datasets is very low, while the locations of linked data endpoints change frequently. There are linked data respsitories that collect and list the available linked data endpoints or resources. It is observed that around half of the endpoints listed in existing repositories are not accessible (temporarily or permanently offline). These endpoint URLs are shared through repository websites, such as Datahub.io, however, they are weakly maintained and revised only by their publishers. In this study, a novel metacrawling method is proposed for discovering and monitoring linked data sources on the Web. We implemented the method in a prototype system, named SPARQL Endpoints Discovery (SpEnD). SpEnD starts with a “search keyword” discovery process for finding relevant keywords for the linked data domain and specifically SPARQL endpoints. Then, the collected search keywords are utilized to find linked data sources via popular search engines (Google, Bing, Yahoo, Yandex). By using this method, most of the currently listed SPARQL endpoints in existing endpoint repositories, as well as a significant number of new SPARQL endpoints, have been discovered. We analyze our findings in comparison to Datahub collection in detail.

This record has no associated files available for download.

More information

Published date: 2017

Identifiers

Local EPrints ID: 479641
URI: http://eprints.soton.ac.uk/id/eprint/479641
ISSN: 0916-8532
PURE UUID: 0f3b4ea0-6348-4333-b28d-2b64b9332d98

Catalogue record

Date deposited: 26 Jul 2023 16:43
Last modified: 17 Mar 2024 02:35

Export record

Altmetrics

Contributors

Author: S. Yumusak
Author: E. Dogdu
Author: H. Kodaz
Author: A. Kamilaris
Author: P.-Y. Vandenbussche

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×