Linking the World Wide Web and Microcosm

Wendy Hall, Les Carr and David De Roure

1.Introduction
2.Hypertext in the Large
3.Navigation and Linking in WWW
4.The Microcosm Link Model
5.Linking Microcosm with WWW
6.Conclusions
7.Bibliography

1. Introduction

Designers of hypermedia systems have recognised the need to move away from closed systems to open environments which separate the link structure from the data in the system, and enable separate link and data processing. The main motivations behind this development are the need to reduce authoring effort in large-scale hypermedia applications and to make them more easily modifiable, customisable and extensible. Microcosm is one such open hypermedia system which has been developed at the University of Southampton. At the heart of Microcosm is the Link Service, which allows links maintained by the system to be applied to information native to third-party applications in the host environment.

The World-Wide Web is an open system: its formats and protocols are well-documented and are negotiated in an international open forum. However, its current use as a hypermedia system is closed in the sense that the link information is hidden within a document's data, working against the aims of large-scale hypermedia as mentioned above. Since this 'closedness' is not a fundamental design feature of the Web but a consequence of current practise in Web document design, then it is more than possible to augment the technology to provide the kind of link service described above. This article discusses the ways and means of combining link service capabilities with the World Wide Web.

2. Hypertext in the Large

The Microcosm open hypermedia system and link service is now well-known and well- documented. Details of the model and architecture can be found in [Fountain et al 90, Davis et al 92, Hill et al 93]. Although Microcosm was designed to deal with hypertext on a large scale, it was implemented in the context of a personal workstation environment. It has lacked the mechanisms to deal with a distributed document set although these are being actively developed. In contrast the World Wide Web (WWW) addresses the issue of hypertext in a global context not just a single text or group of intimately related texts, but hypertext as a universal literature resource [Berners-Lee et al 92].

The experience that most other hypertext systems provide is in the realm of individual documents or local document collections with a controlled environment and context. The design of the WWW project has kept the node and links model of these traditional closed hypertext systems intact but extended the node addressing scheme to allow remote nodes and defined a node transport mechanism to allow the hypertext to be extended across a network.

This simple node-links model and the familiar authoring paradigm that it accompanies has particular implications for the scaleability and maintainability of a very large and highly distributed corpus.

3. Navigation and Linking in WWW

Navigation of the Web by a user is undertaken in one of two ways: The former mechanism requires the user to follow semantic cues in the contents of the documents in order to repeatedly choose the correct links to follow. The latter requires the user to make use of an already-known address, which may come from: i.e. apart from link following, it is only possible to navigate to a document if you have already been there, or if you are provided with a handle to it by its author or by someone else who has been there. This is then a pure link-following environment, without recourse to text searches or comprehensive document catalogues; it is almost impossible to navigate the Web with the aim of finding all documents about a particular topic.

The problem of topic-based navigation of the Web is similar to the problem of finding a file on a particular subject on the Internet's anonymous FTP service. In that environment at first enthusiastic volunteers published regular lists of sites and kinds of files at each site. Some sites also used to provide a file containing a complete list of all the files available from their machine. Eventually a single site provided a database of the names of files available at all of the well-known anonymous FTP sites; an interactive query service (known as archie) allowed any user to find out where a file was archived given a fragment from that file's name. This service has now been replicated across several dozen sites across the whole Internet, so that any user can obtain a list of potentially relevant files as long as the name of the file is indicative of its contents. A similar system could be applied to the Web; already software is available to allow the administrator to automatically catalogue each of the Web server's files.

Link fossilisation is a significant disadvantage of WWW and occurs because link specifications have to be published as part of the document and cannot be changed without revising the document. Link decay is also seen since links refer to their destination anchors via a specific machine name and path name. Any change to the position of the destination requires every source document which refers to it to be changed once published a document can never be moved or deleted. Although this is not an insurmountable problem in a locally controlled context, WWW used as a world-wide publishing mechanism assumes that every document is forever associated with its published address. Dead ends frequently occur in WWW because only native WWW documents can have embedded links. If traversing a link leads to a foreign document being displayed by a foreign application (a spreadsheet file displayed by Excel) then no WWW links may be followed from it.

4. The Microcosm Link Model

A generic link, the most common link type in Microcosm hypertexts, allows the author to associate a document with any occurrence of a particular textual string in any document. At first sight this may seem to be just a text retrieval operation, however there are certain key differences. Firstly, from a practical point of view, a generic link requires no indexing of the possible destination documents, nor a searching operation on every document in the hypertext in order to satisfy the link‹a generic link has none of the overheads associated with text searching. Secondly, the difference between generic links and text retrieval is the difference between intentional and non-intentional hypertexts: a link expresses an author's knowledge of a relationship between the meaning of two entities in the hypertext, whereas a text retrieval operation expresses a statistical similarity in textual features of two hypertext entities. It is possible to liken a generic link to a text retrieval operation in reverse: a generic link defines a collection of applicable sources, whereas a text retrieval operation describes a collection of applicable destinations.

The flexibility of Microcosm link sources provides a reversed hypertext authoring paradigm: which other nodes may be linked to the current node, instead of where can the current node lead to? Effectively, the author, using the generic link mechanism, is labelling the document with key words or key phrases. Thus the authoring paradigm has become declarative in nature, describing the data rather than the processes involved in document links.

Hypertext packages are frequently difficult to author in a scaleable or generic fashion which allows for expansion or economic re-use for different purposes. The links, authored for a particular purpose, are fixed inside the document content and fixed to specific destinations.

Updating a Microcosm hypertext by adding new nodes involves one of two scenarios. If the nodes are new general resources (primary materials) then a group of new generic links must be added which will retrospectively apply to the existing hypertext components. If instead they are new secondary materials (e.g. student essays or teacher commentaries on the primary materials) then they will already be affected by the existing links. In this respect the Microcosm hypertext model is incrementally scaleable.

Changing the purpose of the hypertext may involve keeping the collection of nodes substantially the same, but reworking links to provide different structures of access. In many hypertext environments including the Web, changing the links means rewriting the texts because the links are embedded in the texts. In Microcosm it simply means applying a new set of linkbases to the same material, in a similar way to Intermedia's use of webs. Another advantage of Microcosm is that material which is added during the repurposing process will be automatically affected by any retained linkbases. Since many hypertext environments provide embedded point-to-point linking (i.e. from here you can go here) they fail to offer such expandability or maintainability.

As a particular example of the advantages of this authoring paradigm, consider setting up a multiple-choice test based on material in a standard course text. In a normal environment containing only specific links between nodes, for each possible wrong link (i.e. wrong answer) a separate correcting explanation must be written for the user, recalling the material in the original sources. Using Microcosm, the question, the text of each answer and any explanations written will automatically be linked back to the concepts in the original sources.

5. Linking Microcosm with WWW

The World-Wide Web is characterised by a number of components: (i) a single, well- defined native data format for use with a document viewer, (ii) a universal addressing scheme with associated transfer protocol and (iii) a hypertext authoring scheme in which precise destination addresses of links are specified as part of the source documents. In comparison with WWW, Microcosm is characterised by (i) a co-operative framework for diverse document viewers and (ii) a hypertext authoring strategy which is based on generic relationships between source and destination documents.

Microcosm does not suffer from some of the problems of the Web. Dead ends do not occur because almost any program can be used as a Microcosm viewer for many different kinds of data: links can be followed not only between text and graphic files, but between word processed documents, CAD documents, spreadsheets, databases, video documents and simulations etc. Links do not get fossilised because they are not embedded in the documents to which they refer, and they are less prone to decay because they represent rules for linking sets of documents together, rather than specific hardwired document references.

We are experimenting with various approaches to combining Microcosm and the World-Wide Web. The first approach is to treat the Web as just another application which the user can control from their personal information environment and in which Microcosm acts as the 'glue', linking together information from Web pages and local documents. In this scenario the local information environment, controlled by Microcosm, is the primary focus and the WWW viewer co-operates to provide the usual link following and authoring services to the user, so that the user can follow Microcosm links to and from Web pages as well as clicking on buttons in Web pages.

The second approach is to treat the two environments as distinct, but to provide a conversion from sets of Microcosm hypertexts into the appropriate Web formats. This allows the hypertext author to create documents and links in Microcosm's flexible environment and then have them compiled together with the documents by mcm2html into sets of static HTML files for the Web. The end user then sees only a set of WWW documents with embedded buttons for navigation.

A third approach is to provide Microcosm's flexible link services to WWW users who do not have a local Microcosm environment. This is achieved by mimicking both Microcosm's architecture and external link databases in the Web. The Microcosm architecture consists of individual requests marshalled through a chain of processes: these requests are implemented as HTTP messages, received by a CGI script and then routed through a set of processes on the server. Each of these processes may try to satisfy a link request by accessing a particular link database, or by matching some data in an external resource such as a dictionary or a set of manual pages. Any results get returned to the user as an answer to the HTTP request.

The user's interface to this third approach is in the form of an adjunct to the standard WWW browser, an icon which allows the user to bring up a menu of link options (follow/create/show links). This icon may be attached to the browser's title bar itself (as in the Microsoft Windows version) or may be a part of the desktop (the X11 version), but it is required to allow the user access to Microcosm's selection/action link following paradigm: the user may select a piece of text in the WWW client (or any other) window and choose follow link from the adjunct's menu. The adjunct causes the WWW browser to send an HTTP request to the server (using the CCI standard, if available) and after a short delay the client receives an HTML document with a list of possible destinations that were determined by the server. This document (titled "Available Links") contains a set of standard HTML buttons which link to the destinations given in the link databases, and allows the user to choose from the set of possible destinations.

6. Conclusions

The flexibility of the Microcosm model makes the possibilities for link determining endless. We are currently experimenting with the development of many different types of filters to automatically generate links to reduce authoring effort, and to create links dynamically according to different algorithms. For example, we are using rule-based algorithms to create an intelligent agent device. The integration with visualisation systems such as Autocad is allowing us to experiment with different metaphors for the development of user interfaces to a large sets of multimedia data.

An open hypermedia system like Microcosm has an intrinsically different feel from closed hypermedia systems. The onus is on the user to interrogate the system in order to ask for more information rather than expecting the system to announce to the user that there is more information about a particular subject. The infinitely more flexible model allows us to customise the hypertext environment to the user's needs [Hall 94].

Currently, in order to access a piece of information on the Web it is necessary either to know its address or to be able to find a document that contains a link which references it. In an environment which has no alternative methods of navigation (e.g. a hierarchical structure) this can cause considerable problems, especially if documents are revised. Although a problem in a localised hypertext environment, this is especially significant in a global, uncoordinated information system. Using generic links the reader can instead select any relevant text to act as a link to the required information.

Similarly, WWW authors would have greater freedom in the authoring process: instead of providing explicit buttons for navigation to every relevant piece of material, generic and other dynamically generated links can be used to provide a range of services across a whole domain of information.

7. Bibliography

Berners-Lee T. et al World Wide Web: the Information Universe Electronic Networking 2,1 pp 52-58 (1992)
Davis, H.C., Hall, W., Heath, I., Hill, G.J. & Wilkins, R.J. Towards an Integrated Information Environment with Open Hypermedia Systems in Proceedings of ECHT92, ACM Press, pp 181 - 190 (1992).
Fountain, A.M., Hall, W., Heath, I. & Davis, H.C. Microcosm: An Open Model for Hypermedia with Dynamic Linking. In Proceedings of ECHT90, Cambridge University Press, pp 298 - 311 (1990)
Hall, W Ending the Tyranny of the Link IEEE Multimedia 1,1 pp60-68 (1994).
Hill, G.J., Wilkins, R.J. & Hall, W. Open and Reconfigurable Hypermedia Systems: A Filter Based Model Hypermedia 5, 2 pp 103-118 (1993)

Wendy Hall, Leslie Carr & David De Roure
Multimedia Research Group
Department of Electronics and Computer Science
University of Southampton
Southampton SO17 5BJ