Ongoing Development of an Open Link Service for the World-Wide Web

Leslie Carr, David De Roure & Gary Hill
Multimedia Research Group, University of Southampton, UK


The Distributed Link Service is a distributed system of link services which can be used to provide a configurable view upon suites of static resources in the World-Wide Web. The use of independent link servers to provide increased usability for many classes of users has been demonstrated in the context of the World-Wide Web [Carr et al 95, Carr et al 96]. However, the engineering requirements of producing and maintaining software that applies the available link services to a range of different client browsers, viewing applications and host operating systems is an extreme disadvantage.

As an alternative it is possible to make the link service function transparently by integrating it into the document delivery service, accomplished by grafting the link service into a WWW proxy. This paper discusses the benefits to the user of such a system of backroom link services and compares it with the use of WWW transducers [Brooks et al] and open hypermedia shims [Davis et al 96].

Key words: Open hypermedia, link servies, WWW


1. Introduction to Link Services

Open hypertext systems [Malcolm et al 91, Davis et al 92] aim to act as an underlying hypermedia link service (a term first used by Pearl [Pearl 89]), rather than provide a closed environment in which to present information. Such a link service aims to allow hypertext facilities to be accessed from any available application, thus acting as a service component of the user's environment. In order to provide such a facility, link information must be managed separately from documents, so that links may be applied to documents in any format [Davis et al 94].

The World Wide Web (WWW) is undoubtedly one of the more successful hypertext systems, but it is a largely closed system, dependent on the use of HTML document content for the provision of linking facilities. Although links may be created to documents other than those in HTML and image formats, such links are dead ends, and there is no way to follow any further links e.g. links from spreadsheet documents. There is also no way for additional links to be made available by third parties, as all link information is embedded in documents.

WWW embedded links and the external links provided by an open hypermedia system are described as locspecs and refspecs respectively, according to an extended version of the original Dexter model [Grønbæk & Trigg]. By applying refsepcs to the WWW it is possible to employ an open hypertext approach to the authoring and management of World Wide Web hypertext documents [Hill et al 95] and to provide more flexible facilities. This paper will show how we have provided a link service for the WWW, based upon the model used in the Microcosm open hypertext system [Hill et al 93].

The development of open hypermedia systems has highlighted a number of advantages over closed systems which embed link information into documents. The most significant examples are briefly described below.

1.1 Reduced Authoring and Maintenance Effort

The most obvious advantage is the ability to improve authoring efficiency, and subsequently to aid in the maintenance of hypertext documents. Through the use of a wide range of link types [Davis et al 92], for example generic links and information retrieval facilities, it is possible to rapidly create a useful set of links.

In particular, the use of generic links allows common links to be authored only once - wherever the source selection of the link occurs, the link is available, including any documents subsequently made available. Typically such links would be created on names of people and places, or common terms, to provide access to more detailed information. In a closed system, such links need to be created wherever the source term appears in a document, and new documents also need to be linked into the system manually.

This form of linking also reduces maintenance requirements, as changes to links need only be made to the central link databases, and will immediately be effective wherever the link is available. This can reduce problems frequently encountered in the WWW, such as link fossilisation and decay [Hill et al 95]. Finally, a separate link database allows much more efficient automatic processing and editing of links.

1.2 Enhanced Reader Experience

Another aspect of a link service is the integrated interface offered. Rather than provide hypertext facilities as a single application, a link service aims to provide underlying hypertext facilities to the userís whole environment. Thus hypertext navigation may be used as a general method for managing and traversing the user's information space.

In addition, the type of linking described in the previous section allows the user a more flexible approach to link traversal. Rather than rely on those links highlighted by the system, the user is also able to select arbitrary items and query the system for possible linksóthus creating a 'reader-led' navigation paradigm

Readers may also be provided with the facilities necessary to create their own links, allowing them to annotate material which in other systems they would not be able to annotate and freeing them from a hypertext structure created purely by designated authors. If these databases may be shared with other users, collaborative authoring of hypertext resources is enhanced.

1.3 Alternative Views

A powerful mechanism made possible through the use of separate link management is the provision of multiple link databases for the user to select from. Thus the user may select link sets that reflect the context in which he wishes to investigate a particular set of documents. For example, in a university department with a particular set of resources, students might select from tutorial-oriented linksets, whilst departmental staff might use an entirely different linkset to support research activities.

Another possibility is a separation between information provider and link provider. At present, hypertext material is usually delivered with links inextricably bound to the associated material. A link service can help to overcome this restriction, by providing the facility to apply completely different link sets to a set of documents, or conversely to apply existing links to new documents not available when the links were originally created. This makes it possible for third parties to offer pure linking services which end users may apply to any documents which they can access, breaking the common binding between content and link structure.

Finally, this facility can also aid in more efficient management of hypertextual information. If a variety of link structures are to be applied to a particular set of documents, changes to the document set are easier to make if the link information is managed separately. If link information had to be embedded in the documents, then many different document sets would have to be maintained in order to provide alternative link structures. Similarly, if new documents are introduced, existing link information need not be embedded in them to facilitate navigation, links are immediately available.

2. A Simple Interactive Link Service

It is clear from the discussion above that the WWW, as it is used at present, is unable to provide many of the desirable features of a hypertext link service. However, the open nature of the framework upon which the WWW is based, in particular the ability to transfer arbitrary data between client and server, the extensibility of servers through the use of CGI utilities, and lately the ability for external processes to communicate with WWW browsers such as Netscape and Hotjava, means that it is possible to extend the hypertext model of the WWW to support the more advanced features required of a link service. In effect the WWW infrastructure, now widespread and readily available, can be used as the communication framework for a distributed hypertext link service.

We have developed the Distributed Link Service (DLS) as such a system. It is able to work in conjunction with existing WWW resources to support an additional underlying link service, which is able to provide the features described in the previous section. This system is based upon our experiences developing the Microcosm hypertext system [Davis et al 94]. Like Microcosm, the DLS utilises a variety of link database processes to offer flexible hypertext functionality to a wide range of end-user applications.

The DLS [Carr et al 95] is composed of two parts: the server facilities which are accessed via the WWW, and the client interface which work in conjunction with a WWW browser.

2.1 Link Server

The link server facilities of the DLS which were implemented first as CGI scripts invoked by a standard WWW server, are now implemented as modules of a pseudo-WWW server. This pseudo-server interacts with clients as if it were a normal WWW server, using enough of the hypertext transport protocol to allow normal interaction with a browser, but it does not store or return any documents. Instead, modules are available to allow the creation, traversal and editing of links, which are stored in a number of link databases. The databases use an SGML style mark-up, and record the source and destination attributes of the link, the type of the link, its creation time and a link description.
Figure 1a: A user requests a link from the link service using the client interface

There are several different link database categories supported by the system, at the most general level are server databases, which apply whenever the system is queried. Link databases may also be provided for a group of documents, or a particular document. In addition, a variety of ëcontextí link databases are available which the user may select from. By choosing a different context, the user may adjust the available link set to best suit their current information requirements. The user is also provided with a personal link database in which they may create private links that only they have access to.

The server receives details from the DLS client of the userís selection, the document in which the selection was made, and the context selected. The followlink module determines which link databases are required, and gathers these together to satisfy the request. Like Microcosm, the system supports the use of generic links, which allows links to be applicable beyond the scope in which they were originally created.

The editlink module provides an HTML form which allows the user to select from the available link databases and edit the links contained. For example, changing the default link description, and updating the type of links. The createlink module accepts details of start and end points for a link, and enters a new link into the specified context link database if there is one. Otherwise, the link is entered into the user's personal link database. The context module provides a list of the different context link databases available on the server. This can be used by the client to present a menu of contexts to the user.

2.2 Client Interface

A DLS client is available for PC, Mac and UNIX platforms, and is a simple utility which formulates DLS requests, and communicates these to the selected link server via a WWW browser. The client reacts to a DLS request from the user by extracting details of the selection the user has made, the document in which this selection is found, and any current context selected. This information is encapsulated as an HTTP request and communicated to the WWW client browser using the platformís appropriate IPC facilities. The results of the DLS request are returned by the link server to Netscape which may then present the results to the user. For example the result of a Follow Link request might offer a list of appropriate links, or indicate that no links were found.
Figure 1b: The link server responds with a page of available destinations

3. An Interfaceless Proxy Link Service

A client of the link server can extract details from any application (not just a Web browser) and create link requests which are passed onto the link server via a Web browser. The DLS can therefore provide links for data maintained by applications which may not have their own hypertext linking mechanisms. Alternatively, it is possible to combine the chosen links in with the original document, if the documentís format is capable of representing hypertext links. Hence an option for producing hypertext material is to develop it using the interactive clients described above, and then to compile a chosen set of link databases into a specific set of document resources (currently in HTML, RTF or PDF format) which will then be independent of the link service. By varying the compilation parameters, different webs may be produced over similar material for different audiences.

A major problem with the interactive client is the engineering requirements of producing and maintaining software that applies the available link services to a range of different viewing applications using a variety of WWW browsers on a range of different host operating systems. Hence an alternative, ëinterfacelessí approach was investigated: to make the link service transparent to its users by embedding it in the Webís document transport system, compiling links into documents as they were delivered to the user by a specially adapted WWW proxy server.

This approach requires no extra client software for the user, which is an immediate practical benefit, but it does suffer from a number of disadvantages. Firstly, the loss of interaction makes it impossible to create a link by the usual method of making a selection and choosing Start Link from the menu. It also changes (perhaps for the worse) the browsing paradigm from ìreader-directed enquiryî to ìclick on a predefined choiceî [Hall94]. Secondly, this behind-the-scenes link compilation is applicable only to documents which are delivered via the WWW and which are coded in well-understood document formats that can themselves support some form of hypertext link. These requirements abandon some of the advantages of the open system previously described, since there are relatively few document formats which can have links embedded.

3.1 Link Service Architecture

This section describes our general model for link resolution on a network. The diagram in figure 2 depicts a user sending a "followlink" query to a process, which in turn consults other processes; this extends recursively. This client-server approach leads to the tree structure shown. In reality some of the nodes may reside on the same physical processor or even process, but the diagram depicts the logical structure in the processing of the query.
Figure 2: Link Service Resolution Architecture

There are three kinds of data that can move between nodes in this diagram: the query, link data or the results of resolving the query. The nodes represent link processing agents, of which the link resolution agent (with local link data) is the only instance enountered so far; others will be introduced later.

In the simplest scenario, the link data is static: the query travels from left to right, is resolved at a node or nodes with the appropriate link data, and the results travel from right to left back to the user. There are three types of nodes which may exist separately or in combination:

1. LRAs. These resolve the query against local link data and return the result.

2. Caches. The same query will return the same results from the same linkbases, so it is possible for processes to cache the result of queries in order to speed up response and promote scalability.

3. LRA proxies. These processes appear as LRAs but propagate the query to other nodes and aggregate the responses. There are two roles for these: implementing concurrent processing of queries, and providing redundancy to cope with failure of parts of the system.

The simple scenario extends naturally to the case where the link data is itself mobile. Here, the LRAs can request the link data from link data servers, which may be identical to document servers: link data is just a special document type. This means that link data can be cached using the smae techniques as for the caching of document data. The link data server is then another process type in the diagram, and there is a new type of LRA which can import link data from these servers.

Finally, the LRA itself may be mobile. Instead of the data moving to the agent, the agent can then move to the data. This model is a topic of research but has not be realised in any practical DLS implementations at this time.

Note that a simple client might talk to the first process in the diagram, but more sophistcated clients are possible which incorporate the functionality of any of the process types discussed aboveóthese are ìheavyweight clientsî. For example, the client may have knowledge about which link server to contact and may itself implement some concurrency or fault tolerance. In particualr, it might have link resolution functionlity so that link resolution can occur when the user is offline; this is particularly appropriate when the user is using mobile equipment.

3.2 Link Control Panel

Since the interfaceless link server requires at least some initial configuration (for example choosing applicable sets of link databases), a method for communicating with the server has been developed. This takes the form of a kind of a ìlink remote controllerî which is an HTML form displayed as a separate window and whose results are interpreted by a module in the link server.
Figure 3: Link Service ìRemote Controllerî

The purpose of the controller is to give to the user the ability to choose how links are selected and displayed within the processed documents. The simple control panel in figure 3 gives the user the ability to choose which one of the serverís installed linkbases are to be combined with requested documents, as well as the opportunity to choose whether the links are displayed by underlining the link source text (the default), by inserting asterisks after the link source text (a footnote style) or by inserting citation markers after the source text and then appending a ìlink bibliographyî to the document as a whole. It is also possible to completely bypass the link compilation if a ìnormalî document viewing mode is required.

The controller establishes a dynamic session (a binding of a user and host together with a set of link server parameters) which is used to control the behaviour of the link server from that point in time onwards for that particular user. It is intended that the user will invoke the controller just once to set their preferred configuration, and only again afterwards to adjust the configurationólinks will always be added automatically to the documents according to the last settings of the controller.

A more complex control panel provides a greater degree of control over the linking process. This enables the user to specify in some detail which link databases are switched on and off as the user browses in and out of a number of document resources, to control the kinds of linkbase that are used at such a point (e.g. internal navigation through a resource vs citation of documents external to the resource) and to determine how the server is to cull links from a potentially over-annotated document. The Open Journal Framework [Carr et al 96] makes use of the control panel to help the user navigate through large suites of collected but separate Internet resources, all integrated by the use of linkbases. Since at any one time there may be many dozens of link databases active, providing links of various levels of 'pertinence' an important task of the server is to throttle over-zealous link producers and allow the user to choose the overall proportion of link items to document content, the maximum number of links to appear on each key phrase, the maximum number of links to a particular resource and which link authors to take preference. All these can be controlled from a more complicated version of the standard sessionís control panel.

By introducing a model of Internet resources (collections of documents and associated link databases) and aggregations of these resources (collections of collections of documents and associated link databases), it is possible to define the userís ìstatic locationî in a document space, and hence to know what hypertext actions are applicable at what point in that document space. If the user travels outside all known resources (e.g. to a colleagueís personal home page), they have the option of still applying the most general links or to have the link server refrain from applying any links. Without this model (in the case of the simple controller) the same sets of link databases are applied to any document which the user sees.

4. Related Work

The concept of external link services has been a familiar part of the hypertext research community for many years, especially in the area of Open Hypermedia Systems. Although the WWW is a closed system according to OHS classifications, the provision of a link service provides a degree of openness, along with the Webís own use of external document viewers. According to the flag taxonomy of open hypermedia systems [Østerbye & Wiil] the DLS is an incomplete session manager, taking on the responsibility of link availability i.e. controlling which links should appear as link markers. Link activation, the other normal responsibility of a session manager, is usually handled by the WWW browser itself., although it is possible to make the DLS perform this task by selecting ëindirectí links from the control panel in figure 3.

The use of an Open Hypermedia Protocol (OHP) for such an environment has been recently discussed [Davis et al 96]. The architecture to support this uses shims to convert between the native protocols of a link service and the native protocols of a client application so as to allow an application to make use of many different link services. A clientís shim communicates with each serverís shim by using the OHP standard and so receives its linking information independently of the implementation of the link service.

The shims work is not directly comparable with the situiation on the WWW as native client of the WWW speaks a standard protocol to a WWW server not to receive linking information, but to receive a document (which indirectly contains links). The DLS server masquerades as a WWW server in order to resolve an explicit request for a document and an implicit request for links and translate the results back into data that the client is expecting, i.e. a document. This can be seen as a variant on the standard OHP architecture where the client shim is actually co-located with the link server. In this situation the link server has the additional responsibilities of procuring the document and merging the links into the document; in the OHP scenario these two tasks are accomplished by the client. Although the two situations are not exactly congruent, it is possible that the OHP protocol could be effectively used between the components of the distributed server described in section 3.1.

Research in the WWW community [Brooks et al] has been focussed on the use of transducers which intercept the flow of communication between a client and server, modifying the request or response in some way. Such transducers have been used to experiment with adding extra functionality to the document server in the form of annotations, indexes and change marks [Meeks et al]. The DLS server acts very similarly, modifying the WWW server response (the document) by adding extra data to it, but for reasons of efficiency it does so as a ìmutant proxy serverî rather than an extra processing node on the communications stream. The DLS also breaks the transducer model by allowing the client to communicate directly with it by using the ìLink Remote Controllerî described in section 3.2.

Others have investigated the use of independent meta-information servers but to provide collaborative annotations for WWW pages rather than links [Röscheisen et al]. This is actually quite similar in concept to the DLS, as links can be easily provided within an annotation framework (in fact the DLS provides support for annotations through the inclusion of extra metadata in the link databases).

Conclusions

A hypermedia link service provides important functionality for any information system. In conjunction with the Web it provides a powerful tool with which to address many of the restrictions often experienced with traditional Web services, including ease of information maintenance and enhanced authoring capability. We have shown that a simple link service can be implemented using standard Web browsers and servers, how it can be implemented without the need for additional client software and we have described architectures which permit a link service to scale to multiple hosts, based on proxies and redirection.

We are continuing to develop the Distributed Link Service following our open hypermedia philosophy, adopting new browser and server technologies as they become available. Future work includes an investigation of client-side link resolution (the ëheavyweight clientí), link caching on proxies, multicasting to multiple link servers and experiments on controlling the presentation of links. Tools which utilise the link service are being designed within specific projects, and we hope to make generic tools available in the future.

Acknowledgements

The work described in this paper was partially supported by JISC grant ELP2/35 and a UK ROPA award.

Bibliography

[Brooks et al] Application-Specific Proxy Servers as HTTP Stream Transducers, C. Brooks, M. Mazer, S. Meeks and J. Miller, Proceedings of The Web Revolution: Fourth International World Wide Web Conference, in The Web Journal 1(1), OíReilly and Associates.

[Carr et al 95] The Distributed Link Service: A Tool for Publishers, Authors and Readers, L. Carr, D. De Roure, W. Hall and G. Hill, Proceedings of The Web Revolution: Fourth International World Wide Web Conference, in The Web Journal 1(1), OíReilly and Associates.

[Carr et al 96] Open Linking Services, L. Carr, D. De Roure, W. Hall and G. Hill, Proceedings of the Fifth International World Wide Web Conference.

[Davis et al 92] H. Davis, W. Hall, I. Heath, G. Hill, R. Wilkins, Towards an Integrated Information Environment with Open Hypermedia Systems, in ECHT '92, Proceedings of the Fourth ACM Conference on Hypertext, Milan, Italy, November 30-December 4, 1992, ACM Press, 181-190.

[Davis et al 94] H. Davis, S. Knight, W. Hall, Light Hypermedia Link Services: A Study of Third Party Application Integration, in Proceedings of the Sixth ACM Conference on Hypertext, Edinburgh, Scotland, September 1994, ACM Press, 41-50.

[Davis et al 95] H. Davis, A. Lewis, A. Rizk, OHP: A Draft Proposal for an Open Hypermedia Protocol, presented at ACM Hypertext 96 Conference, Open Hypermedia Systems Workshop, <URL: http://diana.ecs.soton.ac.uk/~hcd/
protweb.htm>

[De Roure et al 96] A Distributed Hypermedia Link Service D. DeRoure, L. Carr, W. Hall and G. Hill, Proceedings of the Third International Workshop on Services in Distributed and Networked Environments (SDNE96), IEEE Computer Society Press 1996

[De Roure et al 96b] Agents for Distributed Multimedia Information Management, D. De Roure, W. Hall, H. Davis and J. Dale, Proceedings of PAAM'96

[Grønbæk & Trigg] Toward a Dexter-based reference model for open hypermedia: Unifying embedded references and link objects, K. Grønbæk, R. Trigg, in the Proceedings of the Seventh ACM COnference on Hypertext, 1996

[Hall 94] W. Hall, Ending the Tyranny of the Link, IEEE Multimedia 1,1 pp 60-68 (1994).

[Hill et al 93] G. Hill, R. Wilkins, W. Hall, Open and Reconfigurable Hypermedia Systems: A Filter Based Model, Hypermedia, 5(2), 1993.

[Hill et al 95] Applying Open Hypertext Principles to the WWW, G. Hill, W. Hall, D. De Roure, L. Carr , International Workshop on Hypermedia Design 1995, Montpellier, France, 1-2 June 1995.

[Malcolm et al 91] Malcolm, K.C., Poltrock, S.E., Schuler, D. Industrial Strength Hypermedia: Requirements for a Large Engineering Enterprise. In: Hypertext 91: Proceedings of Third ACM Conference on Hypertext, San Antonio, TX. ACM Press, 1991, 13-24.

[Meeks et al] Transducers and Associates: Circumventing the Limitations of the World Wide Web, W. Meeks, C. Brooks, M. Mazer in Proceedings of the COnference on Emerging Technologies and Applications in Communications 96, Portland, Oregon.

[Østerbye & Wiil] The Flag Taxonomy of Open Hypermedia Systems K. Østerbye, U. Wiil, in the Proceedings of the Seventh ACM COnference on Hypertext, 1996

[Pearl 89] Pearl, A. Sun's Link Service: A Protocol for Open Linking. In: Hypertext '89 Proceedings, Pittsburgh PA, 1989, 137 - 146

[Röscheisen et al] Shared Web Annotations as a Platform for Third-party Value-Added Information Providers: Architecture, Protocols and Usage Examples M. Röscheisen, C. Mogensen, T. Winograd, Technical Report CSDTR/DLTR, Computer Science Department, Stanford University, Stanford, CA 94305, USA.
<URL: http://www-diglib.stanford.edu/rmr/TR/TR.html>