Web Links as User Artefacts

Leslie Carr, David De Roure, Gary Hill & Wendy Hall

Multimedia Research Group, University of Southampton, UK

Abstract

The Distributed Link Service, a distributed system of link services for the World Wide Web [Carr et al 95, Carr et al 96], has been used to impose configurable navigation structures upon suites of static document resources in the World-Wide Web. This paper describes the integration of the link service functionality into a Web proxy server, and examines the implications for the service's user interface in terms of including, representing, discriminating, prioritising and traversing links.

Key words: Proxy services, hypertext links, HTTP stream transducers, open hypermedia

1. Introduction to Link Services

Open hypertext systems [Malcolm et al 91, Davis et al 92] aim to act as an underlying hypermedia link service (a term first used by Pearl [Pearl 89]), rather than provide a closed environment in which to present information. Such a link service aims to allow hypertext facilities to be accessed from any available application, thus acting as a service component of the user's environment. In order to provide such a facility, link information must be managed separately from documents, so that links may be applied to documents in any format [Davis et al 94].

The World Wide Web (WWW) is undoubtedly one of the more successful hypertext systems, but it is a largely closed system, dependent on the use of HTML document content for the provision of linking facilities. Although links may be created to documents other than those in HTML and image formats, such links are dead ends, and there is no way to follow any further links e.g. links from spreadsheet documents. There is also no way for additional links to be made available by third parties, as all link information is embedded in documents.

WWW embedded links and the external links provided by an open hypermedia system are described as locspecs and refspecs respectively, according to an extended version of the original Dexter model [Grønbæk & Trigg]. By applying refsepcs to the WWW it is possible to employ an open hypertext approach to the authoring and management of World Wide Web hypertext documents [Hill et al 95] and to provide more flexible facilities. This paper will show how we have provided a link service for the WWW, based upon the model used in the Microcosm open hypertext system [Hill et al 93].

The development of open hypermedia systems has highlighted a number of advantages over closed systems which embed link information into documents. The most significant examples are briefly described below.

1.1 Reduced Authoring and Maintenance Effort

The most obvious advantage is the ability to improve authoring efficiency, and subsequently to aid in the maintenance of hypertext documents. Through the use of a wide range of link types [Davis et al 92], for example generic links and information retrieval facilities, it is possible to rapidly create a useful set of links.

In particular, the use of generic links allows common links to be authored only once - wherever the source selection of the link occurs, the link is available, including any documents subsequently made available. Typically such links would be created on names of people and places, or common terms, to provide access to more detailed information. In a closed system, such links need to be created wherever the source term appears in a document, and new documents also need to be linked into the system manually.

This form of linking also reduces maintenance requirements, as changes to links need only be made to the central link databases, and will immediately be effective wherever the link is available. This can reduce problems frequently encountered in the WWW, such as link fossilisation and decay [Hill et al 95]. Finally, a separate link database allows much more efficient automatic processing and editing of links.

1.2 Enhanced Reader Experience

Another aspect of a link service is the integrated interface offered. Rather than provide hypertext facilities as a single application, a link service aims to provide underlying hypertext facilities to the user's whole environment. Thus hypertext navigation may be used as a general method for managing and traversing the user's information space.

In addition, the type of linking described in the previous section allows the user a more flexible approach to link traversal. Rather than rely on those links highlighted by the system, the user is also able to select arbitrary items and query the system for possible links - thus creating a 'reader-led' navigation paradigm

Readers may also be provided with the facilities necessary to create their own links, allowing them to annotate material which in other systems they would not be able to annotate and freeing them from a hypertext structure created purely by designated authors. If these databases may be shared with other users, collaborative authoring of hypertext resources is enhanced.

1.3 Alternative Views

A powerful mechanism made possible through the use of separate link management is the provision of multiple link databases for the user to select from. Thus the user may select link sets that reflect the context in which he wishes to investigate a particular set of documents. For example, in a university department with a particular set of resources, students might select from tutorial-oriented linksets, whilst departmental staff might use an entirely different linkset to support research activities.

Another possibility is a separation between information provider and link provider. At present, hypertext material is usually delivered with links inextricably bound to the associated material. A link service can help to overcome this restriction, by providing the facility to apply completely different link sets to a set of documents, or conversely to apply existing links to new documents not available when the links were originally created. This makes it possible for third parties to offer pure linking services which end users may apply to any documents which they can access, breaking the common binding between content and link structure.

Finally, this facility can also aid in more efficient management of hypertextual information. If a variety of link structures are to be applied to a particular set of documents, changes to the document set are easier to make if the link information is managed separately. If link information had to be embedded in the documents, then many different document sets would have to be maintained in order to provide alternative link structures. Similarly, if new documents are introduced, existing link information need not be embedded in them to facilitate navigation, links are immediately available.

2. A Simple Interactive Link Service

It is clear from the discussion above that the WWW, as it is used at present, is unable to provide many of the desirable features of a hypertext link service. However, the open nature of the framework upon which the WWW is based, in particular the ability to transfer arbitrary data between client and server, the extensibility of servers through the use of CGI utilities, and lately the ability for external processes to communicate with WWW browsers such as Netscape and Hotjava, means that it is possible to extend the hypertext model of the WWW to support the more advanced features required of a link service. In effect the WWW infrastructure, now widespread and readily available, can be used as the communication framework for a distributed hypertext link service.

We have developed the Distributed Link Service (DLS) as such a system. It is able to work in conjunction with existing WWW resources to support an additional underlying link service, which is able to provide the features described in the previous section. This system is based upon our experiences developing the Microcosm hypertext system [Davis et al 94]. Like Microcosm, the DLS utilises a variety of link database processes to offer flexible hypertext functionality to a wide range of end-user applications.

The DLS [Carr et al 95] is composed of two parts: the server facilities which are accessed via the WWW, and the client interface which work in conjunction with a WWW browser.
The link server facilities of the DLS are implemented as modules of a pseudo-WWW proxy server. It uses enough of the hypertext transport protocol to allow normal interaction with a browser but also contains modules to allow the creation, traversal and editing of links, which are stored in a number of link databases. The databases use an SGML style mark-up, and record the source and destination attributes of the link, the type of the link, its creation time and a link description.

The stored links (once chosen by the server) are combined in with the original document, if the document's format is capable of representing hypertext links. By varying the compilation parameters, different webs may be produced over similar material for different audiences.This is invisible to the end user, as this process occurs in the Web's document transport system, compiling links into documents as they are delivered to the user by the specially adapted WWW proxy server.

3 Link Service Architecture

This section describes our general model for link resolution on a network. The diagram in figure 2 depicts a user sending a "followlink" query to a process, which in turn consults other processes; this extends recursively. This client-server approach leads to the tree structure shown. In reality some of the nodes may reside on the same physical processor or even process, but the diagram depicts the logical structure in the processing of the query.

Figure 2: Link Service Resolution Architecture

There are three kinds of data that can move between nodes in this diagram: the query, link data or the results of resolving the query. The nodes represent link processing agents, of which the link resolution agent (with local link data) is the only instance enountered so far; others will be introduced later.

In the simplest scenario, the link data is static: the query travels from left to right, is resolved at a node or nodes with the appropriate link data, and the results travel from right to left back to the user. There are three types of nodes which may exist separately or in combination:

LRAs. These resolve the query against local link data and return the result.
Caches. The same query will return the same results from the same linkbases, so it is possible for processes to cache the result of queries in order to speed up response and promote scalability.
LRA proxies. These processes appear as LRAs but propagate the query to other nodes and aggregate the responses. There are two roles for these: implementing concurrent processing of queries, and providing redundancy to cope with failure of parts of the system.

The simple scenario extends naturally to the case where the link data is itself mobile. Here, the LRAs can request the link data from link data servers, which may be identical to document servers: link data is just a special document type. This means that link data can be cached using the smae techniques as for the caching of document data. The link data server is then another process type in the diagram, and there is a new type of LRA which can import link data from these servers.

Finally, the LRA itself may be mobile. Instead of the data moving to the agent, the agent can then move to the data. This model is a topic of research but has not be realised in any practical DLS implementations at this time.

Note that a simple client might talk to the first process in the diagram, but more sophistcated clients are possible which incorporate the functionality of any of the process types discussed above - these are 'heavyweight clients'. For example, the client may have knowledge about which link server to contact and may itself implement some concurrency or fault tolerance. In particualr, it might have link resolution functionlity so that link resolution can occur when the user is offline; this is particularly appropriate when the user is using mobile equipment.

3.1 Link Control Panel

Since the interfaceless link server may require at least some configuration beyond the defaults set up by the service administrator (for example choosing applicable sets of link databases for an individual), a method for communicating with the server has been developed. This takes the form of a kind of a 'link remote controller' which is an HTML form displayed as a separate window and whose results are interpreted by a module in the link server and retained on the proxy's persistent storage.

Figure 3: Link Service 'Remote Controller'

The controller establishes a dynamic session (a binding of a user and host together with a set of link server parameters) which is used to control the behaviour of the link server from that point in time onwards for that particular user. It is intended that the user will invoke the controller just once to set their preferred configuration, and only again afterwards to adjust the configuration - links will always be added automatically to the documents according to the last settings of the controller.

4. Controlling Links

The purpose of the controller is to give to the user the ability to choose how links are displayed and used within the processed documents. Previous versions of the software [Carr et al 96, Carr et al 96] elevated links to first-class objects from a technical point of view, but the DLS now allows the user to directly manipulate links to control presentation and navigation. This section describes these facilities.

4.1 Link Inclusion

The control panel in figure 3 gives the user the ability to choose which one of the server's installed linkbases are to be combined with requested documents, or to completely bypass the link compilation if a 'normal' document viewing mode is required.

The control panel provides a greater degree of control over the linking process, enabling the user to specify in some detail which link databases are switched on and off as the user browses in and out of a number of document resources, to control the kinds of linkbase that are used at such a point (e.g. internal navigation through a resource vs citation of documents external to the resource)

The Open Journal Framework [Carr et al 96] makes use of this kind of control panel to help the user navigate through large suites of collected but separate Internet resources, all integrated by the use of linkbases. By introducing a model of Internet resources (collections of documents and associated link databases) and aggregations of these resources (collections of collections of documents and associated link databases), it is possible to define the user's 'static location' in a document space, and hence to know what hypertext actions are applicable at each point in that document space. If the user travels outside all known resources (e.g. to a colleague's personal home page), then the option still remains to apply the most general links or else to have the link server refrain from applying any links.

Without this model the same sets of link databases are applied to any document which the user sees.

4.2 Link Presentation

Once a link is selected for inclusion in a document by virtue of its presence in a chosen linkbase and its applicability to the current document (often determined by a simple keyword matching operation) the DLS inserts the link according to a specific presentation format (as seen in figure 4).

Figure 4: DLS adding bibliographic style links to a Web page

The recent standard for Cascading Style Sheets for HTML documents [Lie & Bos] allows the presentation of many document features to be controlled by visual parameters such as font, size and colour. WWW links in HTML documents are in normally tightly bound to previously marked-up anchors, and so a style-sheet's only option for parametrising link presentation is to change the typographic attributes of the (fixed) anchor. By contrast, the DLS has complete freedom to choose how to elaborate a link by binding it to any suitable anchor site in the document. The DLS may apply the link to any part of the document's contents, or may invent a new piece of content to act as an anchor (in the form of a distinguishing marker or a more general annotation).

DSSSL, a related standard for document styles and semantics [cite DSSSL], operates on a model in which documents are processed in two passes: firstly to rewrite and re-order their components and secondly to apply formatting operations to the revised components. This model allows new content to be created for a document as it is processed and is the kind of model which the DLS employs, in contrast to Cascading Style Sheets.

Using the controller, links can be formatted according to the following styles given that a fragment of the document's content has been chosen as a link site

WWW default: the content is enclosed in a <A> tag. The chosen text is displayed as all normal WWW links (determined by the browser options, but often underlined in blue).
footnote: add a footnote marker (asterisk or dagger) after the identified link site. The marker and not the text carries the web link.
citations: add an annotation which looks like a bibliographic citation after the identified link site. The marker and not the text carries the web link, but not to the ultimate destination, rather to the matching entry in a 'link bibliography' appeneded to the foot of the document. The entry in this pseudo-bibliography contains a link to the ultimate destination.
indirect: the link is presented in any of the above styles, but does not go directly to the ultimate destination. Instead, it invokes the link service to produce a page describing the ultimate destination (or destinations if there is more than one link which matches a given position in the document). The indirect option is particularly useful if there are likely to be a large number of links on any given key word, as it stops many pieces of content being invented to site each of the necessary direct links.

Perhaps the most important role of presentation is not decoration but discrimination: a user must be easily able to distinguish between links which are added by the server and those which are native to the document (i.e. links calculated by a computer vs links inserted by the author). In addition the user would like to be able to judge the likely pertinance of each calculated link without having to follow it. For this reason the DLS can further modify the way in which links are emitted by varying their colour or font style so that, for example, brighter links correspond to more direct and specifically authored links created by a human author, whereas duller colors correspond to general links created by a simple dictionary lookup or some statistical lexical operation.

4.3 Link Prioritisation

To allow the user to discriminate between different links the server must have some concept of the pertinance of differnt kinds of links. This measurement can be used by the server in other ways: to determine how it is to cull links from a potentially over-annotated document.

Since at any one time there may be many dozens of link databases active, providing links of various levels of relevance an important task of the server is to throttle over-zealous link producers and allow the user to choose the overall proportion of link items to document content, the maximum number of links to appear on each key phrase, the maximum number of links to a particular resource and which link authors to take preference.

4.3 Link Access

Thus far, the link control facilities have been concerned with rendition: what to put in a document and how. However, if links are first-class objects we can legitimately turn our attention not only to their provision but also to their usage.

The accustomed user interface for links (click and go) is convenient in some respects: uncomplicated and immediate it allows the user directly to jump between information resources. However, it is in other ways a very unnatural activity when compared with the sequence of actions it mimics in the physical world away from the can-do atmosphere of cyberspace.

When readers attend to a journal article in a library, they do not immediately follow each citation and cross-reference that is encountered as the text is digested. Instead, they may make a mental note to follow it up at a more convenient time, even scribbling it down on a pad. Only when the paper has been read to the readers' satisfaction will the attention then be turned to the cited material.

In other words, users have prior experience with a reading model in which they evaluate the content before they evaluate the links, whereas the WWW and most other hypertext environments provide an environment in which the user is repeatedly interrupted, stacking up unfinished document contexts to be returned to later. Although the hypertext environment itself does not force the user to switch contexts to the linked material, the lack of support for any other browsing protocol often makes it the line of least resistance.

In computing terms, the hypertext browser imposes a stack-based document evaluation modality onto the user, replacing a natural queue-based information processing methodology. This stack-based approach is impossible in the real world because of the significant time taken to change document contexts when compared to the Web.

As well as providing mechanisms for controlling the prioritising and presentation of links, the DLS supplies a mechanism to help control the link following process, making it more like the real-world experience described above. It does this by providing an auxiliiary "navigation planner window" adjacent to the users' browser window such that users can drag link anchors from the browser window onto the planner window, where they are displayed as icons. The icons can be moved around the window, clustered together according to the user's own informal classification scheme and subsequently double-clicked to make the browser display the relevant Web document.

Figure 5: DLS link access auxilliary window

As such an electronic notepad has been created for jotting down interesting places to visit (a speculative bookmark list). However, a secondary fuction of the notepad is to pre-fetch the referenced documents while the user finishes browsing the main document, so that the reader really does get instantaneous access when the follow-up texts are examined. (In fact, the referenced URL and all embedded data must be fetched, so that documents containing frames or images will display without delay.)

A further function of the navigation notepad is to contextualise navigation i.e. make explicit the context in which the current document and its linked items are being read. Embodying the notion that reading is done not in an intellectual vacuum, but as part of a process of writing, of note-taking and of goals and strategies for creating other documents.

5. Related Work

The concept of external link services has been a familiar part of the hypertext research community for many years, especially in the area of Open Hypermedia Systems. The WWW is a closed system according to OHS classifications because of its lack of independent link facilities, even though it has an otherwise open architecture. According to the flag taxonomy of open hypermedia systems [Østerbye & Wiil] the DLS acts as a session manager, taking on the responsibility of link availability i.e. controlling which links should appear as link markers. Link activation, the other normal responsibility of a session manager, is usually handled by the WWW browser itself., although it is possible to make the DLS perform this task by selecting 'indirect' links from the control panel in figure 3.

Some research in the WWW community [Brooks et al 95] has focussed on the use of transducers which intercept the flow of communication between a client and server, modifying the request or response in some way. Such transducers have been used to experiment with adding extra functionality to the document server in the form of annotations, indexes and change marks [Meeks et al]. The DLS server acts very similarly, modifying the WWW server response (the document) by adding extra data to it, but it does so as a 'mutant proxy server' rather than an extra processing node on the communications stream. The DLS breaks the transducer model by allowing the client to communicate directly with it by using the 'Link Remote Controller' described in section 3.2.

Others have investigated the use of independent meta-information servers but to provide collaborative annotations for WWW pages rather than links [Röscheisen et al]. This is actually quite similar in concept to the DLS, as links can be easily provided within an annotation framework and vice versa. The DLS itself provides support for annotations through the inclusion of extra metadata in the link databases.

Early work on spatial metaphors for organising and classifying hypertext material [Marshall 91] inspired a prototype of the link access facilities described in section 4.3 by one of the authors [Carr 95]. Further work in the hypertext community on the use of spatial hypertext [Marshall 93, Marshall 94] has resulted in a commercial system (Web Squirrel [Bernstein 96]) which provides improved link access facilities for the WWW.

The WAIBA project of the OSF [Brooks 96] also produced software tools for improving link access for users of the Web. In particular, a Table of Contents agent produced a structural overview of a Web hierarchy to help the user make decisions about how to browse that part of the information space.

6. Conclusions & Future Work

A hypermedia link service provides important functionality for any information system, including ease of information maintenance and enhanced authoring capability. We have shown that a simple link service can be implemented using standard Web browsers and servers, and how it can be implemented without the need for additional client software and we have described architectures which permit a link service to scale to multiple hosts, based on proxies and redirection. We have also shown how a link service can provide novel user-centered facilities for link presentation, discrimination and navigation.

We are continuing to develop the Distributed Link Service following our open hypermedia philosophy, adopting new browser and server technologies as they become available. Future work on the network protocols includes an investigation of client-side link resolution (the 'heavyweight client'), link caching on proxies and multicasting to multiple link servers.

From the user-interface side, user trials are scheduled to determine the exact practical usefulness of this link discrimination by colour in a world of varying browser, rendition and screen hardware technologies.

Tools which utilise the link service are being designed within specific projects, and we hope to make generic tools available in the future.

Acknowledgements

The work described in this paper was partially supported by JISC grant ELP2/35 and a UK ROPA award.

Bibliography

[Bernstein 96] Bernstein M. Eastgate Web Squirrel FAQ <URL: http://www.eastgate.com/squirrel/FAQ.html>

[Brooks et al 95] Application-Specific Proxy Servers as HTTP Stream Transducers, C. Brooks, M. Mazer, S. Meeks and J. Miller, Proceedings of The Web Revolution: Fourth International World Wide Web Conference, in The Web Journal 1(1), O'Reilly and Associates. [Brooks 96] Wide Area Information Browsing Assistance Final Technical Report, C. Brooks, Technical Report, The Open Group Research Institute, 20 September 1996. <URL: http://www.osf.org/www/waiba/papers/y2report/y2report.htm>

[Carr 95] Structure in Text and Hypertext, L. Carr, PhD Thesis, University of Southampton, UK (1995). <URL: http://journals.ecs.soton.ac.uk/lacethesis/>

[Carr et al 95] The Distributed Link Service: A Tool for Publishers, Authors and Readers, L. Carr, D. De Roure, W. Hall and G. Hill, Proceedings of The Web Revolution: Fourth International World Wide Web Conference, in The Web Journal 1(1), O'Reilly and Associates.

[Carr et al 96] Open Linking Services, L. Carr, D. De Roure, W. Hall and G. Hill, Proceedings of the Fifth International World Wide Web Conference.

[Davis et al 92] H. Davis, W. Hall, I. Heath, G. Hill, R. Wilkins, Towards an Integrated Information Environment with Open Hypermedia Systems, in ECHT '92, Proceedings of the Fourth ACM Conference on Hypertext, Milan, Italy, November 30-December 4, 1992, ACM Press, 181-190.

[Davis et al 94] H. Davis, S. Knight, W. Hall, Light Hypermedia Link Services: A Study of Third Party Application Integration, in Proceedings of the Sixth ACM Conference on Hypertext, Edinburgh, Scotland, September 1994, ACM Press, 41-50.

[Davis et al 95] H. Davis, A. Lewis, A. Rizk, OHP: A Draft Proposal for an Open Hypermedia Protocol, presented at ACM Hypertext 96 Conference, Open Hypermedia Systems Workshop, <URL: http://diana.ecs.soton.ac.uk/~hcd/
protweb.htm>

[De Roure et al 96] A Distributed Hypermedia Link Service D. DeRoure, L. Carr, W. Hall and G. Hill, Proceedings of the Third International Workshop on Services in Distributed and Networked Environments (SDNE96), IEEE Computer Society Press 1996

[De Roure et al 96b] Agents for Distributed Multimedia Information Management, D. De Roure, W. Hall, H. Davis and J. Dale, Proceedings of PAAM'96

[Grønbæk & Trigg] Toward a Dexter-based reference model for open hypermedia: Unifying embedded references and link objects, K. Grønbæk, R. Trigg, in the Proceedings of the Seventh ACM COnference on Hypertext, 1996

[Hall 94] W. Hall, Ending the Tyranny of the Link, IEEE Multimedia 1,1 pp 60-68 (1994).

[Hill et al 93] G. Hill, R. Wilkins, W. Hall, Open and Reconfigurable Hypermedia Systems: A Filter Based Model, Hypermedia, 5(2), 1993.

[Hill et al 95] Applying Open Hypertext Principles to the WWW, G. Hill, W. Hall, D. De Roure, L. Carr , International Workshop on Hypermedia Design 1995, Montpellier, France, 1-2 June 1995.

[Lie & Bos] Cascading Style Sheets : Designing for the Web, Hakon Wium Lie, Bert Bos, Addison-Wesley Pub Co, ISBN: 020141998X. 1997

[Malcolm et al 91] Malcolm, K.C., Poltrock, S.E., Schuler, D. Industrial Strength Hypermedia: Requirements for a Large Engineering Enterprise. In: Hypertext 91: Proceedings of Third ACM Conference on Hypertext, San Antonio, TX. ACM Press, 1991, 13-24.

[Marshall 91] Marshall, C.C., Halasz F.G., Rogers R.A., and Janssen, W.C., Aquanet: a hypertext tool to hold your knowledge in place. In Proceedings of Hypertext '91, (San Antonio, Texas December 16-18), 1991, pp. 261-275.

[Marshall 93] Marshall, C.C., Shipman, F. M. III. "Searching for the Missing Link: Discovering Implicit Structure in Spatial Hypertext." In Proceedings of Hypertext '93, (Seattle, Washington, November 14-18), 1993, pp. 217-230.

[Marshall 94] Marshall, C.C.; Shipman, F.M.; Coombs, J.H. VIKI: Spatial Hypertext Supporting Emergent Structure. In Proceedings of the ACM European Conference on Hypermedia Technologies (Edinburgh, Scotland, Sept. 18-23), 1994, pp. 13-23.

[Meeks et al] Transducers and Associates: Circumventing the Limitations of the World Wide Web, W. Meeks, C. Brooks, M. Mazer in Proceedings of the COnference on Emerging Technologies and Applications in Communications 96, Portland, Oregon.

[Østerbye & Wiil] The Flag Taxonomy of Open Hypermedia Systems K. Østerbye, U. Wiil, in the Proceedings of the Seventh ACM COnference on Hypertext, 1996

[Pearl 89] Pearl, A. Sun's Link Service: A Protocol for Open Linking. In: Hypertext '89 Proceedings, Pittsburgh PA, 1989, 137 - 146

[Röscheisen et al] Shared Web Annotations as a Platform for Third-party Value-Added Information Providers: Architecture, Protocols and Usage Examples M. Röscheisen, C. Mogensen, T. Winograd, Technical Report CSDTR/DLTR, Computer Science Department, Stanford University, Stanford, CA 94305, USA.
<URL: http://www-diglib.stanford.edu/rmr/TR/TR.html>