Luc Moreau, Simon Miles, Juri Papay, Keith Decker, Terry Payne
Electronics and Computer Science
University of Southampton, UK
Service discovery in large scale, open distributed systems is difficult because of the need to filter out services suitable to the task at hand from a potentially huge pool of possibilities. Semantic descriptions have been advocated as the key to expressive service discovery, but the most commonly used service descriptions and registry protocols do not support such descriptions in a general manner. In this paper, we present an approach and implementation for service registration and discovery that uses an RDF triple store to express semantic service descriptions and other task/user-specific metadata, using a mechanism for attaching structured and unstructured metadata. The result is an extremely flexible service registry that can be the basis of a sophisticated semantically-enhanced service discovery engine, an essential component of a Semantic Grid.
Service discovery is a difficult task in large scale, open distributed systems such as the Grid and Web, due to the potentially large number of services advertised. In order to filter out the most suitable services for the task at hand, many have advocated the use of semantic descriptions that qualify functional and non-functional characteristics of services in a manner that is amenable to automatic processing [2,5,13].
Semantic discovery is the process of discovering services capable of meaningful interactions, even though the languages or structures with which they are described may be different. Typically, a semantic discovery process relies on semantic annotations, containing high-level abstract descriptions of service requirements and behaviour. In this paper, we focus on the means to register such semantic annotations, and discover services using them.
Current standards in the Web Services and Grid communities (including UDDI and WSDL) do not directly support semantic discovery of services . However, given that these standards have been agreed upon by the community, their existence promotes inter-operability with components such as workflow enactment engines.
An essential element in semantic discovery is the ability to augment service descriptions with additional information, i.e. metadata. Providers may adopt various ways of describing their services, access polices, contract negotiation details etc. However, many resource consumers also impose their own selection policies on the services they prefer to utilise, such as provenance, derived quality of service, reputation metrics etc. Furthermore, it is useful to add such metadata not only to service descriptions, but also to any other concept that may influence the discovery process, e.g. supported operations, types of arguments, businesses, users. Such metadata may be structured according to published ontologies, facilitating unambiguous interpretation by multiple users, especially in the case of a public registry; alternatively, such metadata may also be raw and unstructured, in the case of a personal registry used by a single user.
Since current Grid and Web Services standards are not capable of semantic service descriptions, we believe that an information model supporting not only UDDI and WSDL descriptions, but also general metadata attachment , would provide us with a uniform way of querying and navigating service information. We see the use of RDF triples  (subject, predicate, object) as the means to represent all the information in a uniform manner. This information will be stored in a triple store, which can be queried uniformly through the use of a query language such as RDQL . Besides an information model, it is also critical to offer programmatic interfaces that would allow both publishers and third-party users to register their semantic information. Therefore, we have implemented the UDDI interface to this triple store, and additional interfaces to publish metadata and discover services according to metadata.
Our work is a component of the myGrid architecture for semantic service discovery (www.mygrid.org.uk). The functionality we are discussing here allows the attachment of metadata in the form of semantic annotations to services descriptions; such semantic descriptions can be retrieved, and used for reasoning by a Semantic Find component, whose description and interaction with the current component are discussed in a companion paper . The specific contributions and the remaining sections of this paper are the following:
The UDDI service directory (Universal Description, Discovery, and Integration)  has become the de-facto standard for service discovery in the Web Services community. Service queries are typically white or yellow pages based: services are located based on a description of their provider or a specific classification (taken from a published taxonomy) of the desired service type. A query typically returns a list of available services, from which a subset may conform to a known and/or informally agreed upon policy and thus can be invoked. Such approaches work well within small, closed communities, where a priori definitions of signatures and data formats can be defined. However, across open systems, no assumption can be made about how desired services are described, how to interact with them, and how to interpret their corresponding results. Additionally, service providers typically adopt different ways to model and present services, often because of the subtle differences in the service itself. This raises the problem of semantic inter-operability, which is the capability of computer systems to operate in conjunction with one another, even though the languages or structures with which they are described may be different. Semantic discovery is the process of discovering services capable of semantic inter-operability.
Current standards in the Web Services and Grid communities do not support semantic discovery of services . UDDI supports a construct called tModel which essentially serves two purposes: it can serve as a namespace for a taxonomy or as a proxy for a technical specification that lives outside the registry . We believe that such a tModel construct has some intrisinc limitations. While there is no doubt that service classifications are useful, services are not the only entities to be classified. For instance, classifications can also be defined for individual operations or their argument types. However, it is not convenient to use searching mechanisms for services that are distinct from those for their argument types. Likewise, a tModel's reference to an external technical specification, such as a WSDL file describing a service interface, also implies that a different mechanism is required for reasoning over service interfaces.
UDDI provides no data structures to represent either the abstract or concrete details contained within a WSDL document, but only a standard way to express that a service implements a particular WSDL interface. A new proposal allows tModels to reference specific bindings and port types . However, this extension still does not provide access to, or queries over, operations or messages, which would allow the discovery of services capable of specific operations.
WSDL, the interface definition language of Web Services, itself suffers from some limitations, as illustrated by Figure 1 displaying the interface of an existing bioinformatics service (BLAST). It identifies a portType composed of one operation, which takes an input message comprising two message parts in0 and in1. These parts are required to be of type string, but the specification does not tell us what the meaning of these strings is supposed to be. In fact, these are supposed to be biological sequences, for which many formats are supported. This example was chosen because it precisely illustrates limitations of existing service descriptions. While this interface specification could easily be refined by using an XSD complex type , it is unrealistic to assume that all services in an open environment will always be described with the appropriate level of detail. Moreover, should it be so, we cannot expect all service providers to always use type definitions expressed with the terms of reference adopted by a user.
Other relevant initiatives are DAML-S and BioMOBY, which we cannot describe here due to space constraints. Both approaches offer some form of semantic annotation, but are restrictive, in particular, because they are not compatible with the UDDI standard.
Having discussed the limitations of existing technologies, we now focus on the capabilites of our service directory. Specifically, we look at the ways of attaching ratings and functionality profiles to services, and semantic types to operation arguments. Our presentation is based on examples that were generated by dumping the contents of our service directory. The notation adopted in the presentation is N3 format .
In Figure 2, we show the representation of a service annotated by two numerical ratings, with different values, and provided by different authors at different time. The node b1 of the linear representation is anonymous node denoting the service with the metadata attachment of type ``NumericRating''.
In myGrid, we describe services by a service profile  specifying which kind of method they use (uses_method), which task they perform (perform_task), which resources they use (uses_resources) and what application they are wrapping (is_function_of). A relevant excerpt of the service directory contents is displayed in Figure 3, with b1 denoting a service and Pe577955b-d271-4a5b-8099-001abc1da633 the ``myGrid profile''.
In Figure 4, we show a semantic description of parameter in0 declared in the interface of Figure 1. The node rdf:_1 denotes the message part with name in0. It is given a metadata attachment, with value mygrid2:nucleotide_sequence_data, which refers to a term in an ontology of bioinformatics concepts .
We have adopted RDF triples  to represent all descriptions of services, which we store in a triple store . We have designed and implemented a set of interfaces to this triple store in order to offer a service directory functionality. In this section, we present the methods that are relevant to metadata attachment.
The interfaces to publish metadata and discover services according to metadata were designed in a similar style to the UDDI interface, so that UDDI clients could easily be extended to support such features. As an illustration, Figure 5 shows some of the methods that allow the attachment of metadata, respectively to a business service, to a business entity and to a message part. All these methods not only attach some metadata to the respective entity, but also add some provenance information such as author and date of creation. The associated metadata can be structured or unstructured. Symmetrically, services can be discovered by using metadata filtering mechanism. An example of metadata-based search method appears in Figure 5.
As all the information is represented in a triple store, a more direct interface to the triple store allows users to query the service directory using the RDQL query language . An API that allows users to store triples in the triple store is also provided.
Several interfaces currently provide access to our general information model. Some of them preserve compatibility with the existing standards UDDI, and ensure inter-operability within the Web Services community. Others, such as the interface to the triple store, directly expose the information model, and offer a powerful and radically different way of discovering services through the RDQL interface. While such functionality is very useful, its radically different nature does not offer a smooth transition for clients implementors wishing to adopt semantic discovery.
The benefit of our approach is the ability to extend some existing interfaces in an incremental manner, so as to facilitate an easier transition to semantic discovery for existing clients. For instance, we have extended the UDDI find_service method to support queries over metadata that would have been attached to published services. In the method specification of Figure 5, metadataBag, a new criterion for filtering services is introduced, which contains a set of metadata that a service must satisfy.
In this paper, we have presented a mechanism to publish semantic descriptions about services in order to promote semantic inter-operability. Our approach relies on a metadata attachment mechanism, capable of attaching metadata to any entity within a service description. Such metadata need not be published by service providers but can be published by third-party users. Our design extends the standard interface UDDI to provide semantic capabilities, hereby offering a smooth transition to semantic discovery for UDDI clients. We have used these facilities to register service descriptions as specified by the myGrid ontology . Our future work will focus on providing service descriptions to Grid services.
This research is funded in part by EPSRC myGrid project (reference GR/R67743/01) and supported by GRIDNET. Keith Decker from the University of Delaware was on sabbatical stay at the University of Southampton when this work was carried out. We acknowledge Carole Goble, Phillip Lord and Chris Wroe for their comments on the paper.